Computing the reciprocals

Computing reciprocals of single-precision floats

Given a non-zero number d, consider the problem of finding its reciprocal 1/d. The problem is not trivial for the low- and mid-range micro-controllers as they lack the division operation. We assume that d is represented in a normalized floating-point format

d = s · 2^e · m with 1 < m < 2

where s is the sign bit, e is the order, and m is a normalized mantissa (i.e. a number is the range between 1 and 2). The reciprocal of d is then of the form

1/d = s · 2^-(e+1) · (1/m)

The extra 1 in the exponent is due to the fact that 1/2 < 1/m < 1 and we assume that 1/m is normalized, i.e. shifted one bit to the left. A natural approach to compute 1/m is to use a table. However, if the mantissa takes, say, 24 bits the table size becomes impractical. The main idea, which is also used in [2], is to use a small table for an initial approximation of 1/m and then refine it by applying some iterative methods. Our problem can be viewed as finding a root of the function f(x) = d·x - 1. One could try to apply the Newton-Rapson method based on the formula

x_n+1 = x_n - f(x_n) / f'(x_n)

where f'(x) is the derivative of f(x). This way we would still have a problem because the above formula involves a division. However, there are some other known methods that do not involve the division operation. One of them is the Fixed-Point Method for finding the solution to the equation g(x) = x. We apply this method with g(x) = x - f(x). This way the initial equation f(x) = 0 is equivalent to g(x) = x. The recursive formula involved into the Fixed-Point Method is as follows

x_n+1 = g(x_n) = x_n - f(x_n) = x_n - (d·x_n - 1)

For this schema to convert one has to have |g'(x)|<1 which is equivalent to m<2 and, thus, satisfied for the numbers in the range. However, the schema has a pretty slow conversion and required about 20 iterations to reach the accuracy of 2^-23. A better non-linear schema can be derived by using a trick of replacing the original equation d·x - 1 = 0 with d·x² - x = 0. So, with f(x) = d·x² - x and g(x) = x - f(x) the equation x = g(x) leads to the following iteration schema

x_n+1 = g(x_n) = x_n - f(x_n) = x_n - (d·(x_n)² - x_n) = 2·x_n - d·(x_n)² = x_n·(2 - d·x_n)

The convergency condition |g'(x)|<1 in this case translates to x_n>1/2d which is satisfied. This schema needs just 2 iterations to provide an accuracy of 2^-23 for a 23-bit mantissa. This corresponds to the single-precision accuracy (data type float) of high-level programming languages like C or Java. It, however, involves two multiplications instead of just one above. Since software-implemented 32-bit multiplication takes up to 500 CPU cycles in modern 8-bit micro-controllers without a multiplication support in hardware and is at least twice slower than addition, there is a trade-off between the speed and complexity.

In either case to apply the iterations we need a reasonable initial approximation to 1/d. This can be achieved by a table with 127 or 256 entries that takes the high-order 7 or 8 bits of m as an address and returns the high-order byte approximation to 1/m.

Test codes

References

[1] T. Lynch, A. Ahmed, M. Schulte: Rounding Error Analysis for Division, AMD Technical Report, 1995.
[2] J.S. Moore, T. Lynch, M. Kaufmann: A Mechanically Checked proof of the correctness of the kernel of the AMD5_K86™ floating-point division algorithm, IEEE Transactions on Computers 47(9), 913-926, 1998.

printTitle()

Computing reciprocals of single-precision floats

Test codes

References