Best non-trigonometric floating point approximation of tanh(x) in 10 instructions or less

By Ritesh Sahu - September 24, 2022

Description

I need a reasonably accurate fast hyperbolic tangent for a machine that has no built-in floating point trigonometry, so e.g. the usual tanh(x) = (exp(2x) - 1) / (exp(2x) + 1) formula is going to need an approximation of exp(2x).
All other instructions like addition, subtraction, multiplication, division, and even FMA (= MUL+ADD in 1 op) are present.

Right now I have several approximations, but none of them are satisfactory in terms of accuracy.

[Update from the comments:]

The instruction for trunc()/floor() is available
There is a way to transparently reinterpret floats as integers and do all kinds of bit ops
There is a family of instructions called SEL.xx (.GT, .LE, etc.) which compare 2 values and choose what to write to the destination
DIVs are twice as slow, so nothing exceptional, DIVs are okay to use

Approach 1

$\tanh(x)\approx{{x\left({36\over73}+x^2\right)}\over{{32\over73}+\left|x\left({36\over73}+x^2\right)\right|}}$

Accuracy: ±1.2% absolute error, see here.

Pseudocode (A = accumulator register, T = temporary register):

[1] FMA T, 36.f / 73.f, A, A   // T := 36/73 + X^2
[2] MUL A, A, T                // A := X(36/73 + X^2)
[3] ABS T, A                   // T := |X(36/73 + X^2)|
[4] ADD T, T, 32.f / 73.f      // T := |X(36/73 + X^2)| + 32/73
[5] DIV A, A, T                // A := X(36/73 + X^2) / (|X(36/73 + X^2)| + 32/73)

Approach 2

$\tanh(x)\approx\min\left(\max\left(0.1073x\left(1+{{25.125}\over{x^2+3.125}}\right),-1\right),1\right)$

Accuracy: ±0.9% absolute error, see here.

Pseudocode (A = accumulator register, T = temporary register):

[1] FMA T, 3.125f, A, A        // T := 3.125 + X^2
[2] DIV T, 25.125f, T          // T := 25.125/(3.125 + X^2)
[3] MUL A, A, 0.1073f          // A := 0.1073*X
[4] FMA A, A, A, T             // A := 0.1073*X + 0.1073*X*25.125/(3.125 + X^2)
[5] MIN A, A, 1.f              // A := min(0.1073*X + 0.1073*X*25.125/(3.125 + X^2), 1)
[6] MAX A, A, -1.f             // A := max(min(0.1073*X + 0.1073*X*25.125/(3.125 + X^2), 1), -1)

Approach 3

$\tanh(x)\approx\min\left(\max\left(x{{\left({x^2+52.5}\over\sqrt{15}\right)^2-120.75}\over{\left(x^2+14\right)^2-133}},-1\right),1\right)$

Accuracy: ±0.13% absolute error, see here.

Pseudocode (A = accumulator register, T = temporary register):

[1] FMA T, 14.f, A, A          // T := 14 + X^2
[2] FMA T, -133.f, T, T        // T := (14 + X^2)^2 - 133
[3] DIV T, A, T                // T := X/((14 + X^2)^2 - 133)
[4] FMA A, 52.5f, A, A         // A := 52.5 + X^2
[5] MUL A, A, RSQRT(15.f)      // A := (52.5 + X^2)/sqrt(15)
[6] FMA A, -120.75f, A, A      // A := (52.5 + X^2)^2/15 - 120.75
[7] MUL A, A, T                // A := ((52.5 + X^2)^2/15 - 120.75)*X/((14 + X^2)^2 - 133)
[8] MIN A, A, 1.f              // A := min(((52.5 + X^2)^2/15 - 120.75)*X/((14 + X^2)^2 - 133), 1)
[9] MAX A, A, -1.f             // A := max(min(((52.5 + X^2)^2/15 - 120.75)*X/((14 + X^2)^2 - 133), 1), -1)

The question

Is there anything better that can possibly fit in 10 non-trigonometric float32 instructions?

Search This Blog

Theprogrammersfirst | A technical portal.