Computing reciprocals of single-precision floats

Given a non-zero number d, consider the problem of finding its reciprocal 1/d. The problem is not trivial for the low- and mid-range micro-controllers as they lack the division operation. We assume that d is represented in a normalized floating-point format

d = s · 2e · m with 1 < m < 2

where s is the sign bit, e is the order, and m is a normalized mantissa (i.e. a number is the range between 1 and 2). The reciprocal of d is then of the form

1/d = s · 2-(e+1) · (1/m)

The extra 1 in the exponent is due to the fact that 1/2 < 1/m < 1 and we assume that 1/m is normalized, i.e. shifted one bit to the left. A natural approach to compute 1/m is to use a table. However, if the mantissa takes, say, 24 bits the table size becomes impractical. The main idea, which is also used in [2], is to use a small table for an initial approximation of 1/m and then refine it by applying some iterative methods. Our problem can be viewed as finding a root of the function f(x) = d·x - 1. One could try to apply the Newton-Rapson method based on the formula

xn+1 = xn - f(xn) / f'(xn)

where f'(x) is the derivative of f(x). This way we would still have a problem because the above formula involves a division. However, there are some other known methods that do not involve the division operation. One of them is the Fixed-Point Method for finding the solution to the equation g(x) = x. We apply this method with g(x) = x - f(x). This way the initial equation f(x) = 0 is equivalent to g(x) = x. The recursive formula involved into the Fixed-Point Method is as follows

xn+1 = g(xn) = xn - f(xn) = xn - (d·xn - 1)

For this schema to convert one has to have |g'(x)|<1 which is equivalent to m<2 and, thus, satisfied for the numbers in the range. However, the schema has a pretty slow conversion and required about 20 iterations to reach the accuracy of 2-23. A better non-linear schema can be derived by using a trick of replacing the original equation d·x - 1 = 0 with d·x2 - x = 0. So, with f(x) = d·x2 - x and g(x) = x - f(x) the equation x = g(x) leads to the following iteration schema

xn+1 = g(xn) = xn - f(xn) = xn - (d·(xn)2 - xn) = 2·xn - d·(xn)2 = xn·(2 - d·xn)

The convergency condition |g'(x)|<1 in this case translates to xn>1/2d which is satisfied. This schema needs just 2 iterations to provide an accuracy of 2-23 for a 23-bit mantissa. This corresponds to the single-precision accuracy (data type float) of high-level programming languages like C or Java. It, however, involves two multiplications instead of just one above. Since software-implemented 32-bit multiplication takes up to 500 CPU cycles in modern 8-bit micro-controllers without a multiplication support in hardware and is at least twice slower than addition, there is a trade-off between the speed and complexity.

In either case to apply the iterations we need a reasonable initial approximation to 1/d. This can be achieved by a table with 127 or 256 entries that takes the high-order 7 or 8 bits of m as an address and returns the high-order byte approximation to 1/m.

Test codes

References