Floating Point Unit

Programming the Floating Point Unit (FPU)

Hardware

The numeric coprocessor/FPU has eight 80-bit registers named ST0, ST1, ..., ST7. These registers are organized as a stack, however, one can also access any particular register in FPU operations. ST0 is the stack head.

ST1
ST0
ST2
ST3
ST4
ST5
ST6
ST7

An attempt to push more that 8 values into fstack results in an exception.

Instr.	Comments
FINIT	initialize the FPU and clean fstack

FPU has its own 16-bit status register. The comparison operations set particular bits of this register. However, FPU status register cannot be directly used for conditional jumps.

Instr.	Comments
FSTSW mem	copy status reg to mem word
FSTSW AX	copy status reg to AX
SAHF	store AX into the FLAGS reg
LAHF	load the FLAGS reg in AX

FPU also has a 16-bit control register. This register determines, for example, how to convert a floating-point number in FPU into integer representation. By default, the number is rounded-off to the nearest integer.

Instr.	Comments
FSTCW mem	copy control reg to mem word
FLDCW mem	load a mem word into the control reg

Loading and Storing instructions

In the following instructions, src can be a word, dword, qword, or an FPU register, and mem is a memory location of a word, dword, or qword.

Loading instr.		Storing instr.
FLD src	push src into fstack	FST src	copy ST0 to src
FILD mem	push an integer mem into fstack	FSTP src	copy ST0 to src, then pop
FLD1	push 1.0 into fstack	FIST mem	copy ST0 as int to mem
FLDZ	push 0.0 into fstack	FISTP mem	copy ST0 as int to mem, then pop
FLDPI	push pi (3.14...) into fstack
FLDL2E	push log₂e into fstack	FXCH STn	exchange ST0 and STn
FLDL2T	push log₂10 into fstack	FXCHP STn	exchange ST0 and STn, then pop
FLDLG2	push log₁₀2 into fstack
FLDLN2	push log_e2 into fstack

It is easily seen that FXCHP STn instruction is equivalent to FSTP STn.

Addition and Multiplication instructions

In the following instructions, STn is an FPU register, mem is a memory location of a dword or qword, and src is an FPU register or a memory location of a dword or qword.

Addition instr.		Multiplication instr.
FADD src	ST0 += src	FMUL src	ST0 = src*
FADD STn, ST0	STn += ST0	FMUL STn, ST0	STn *= ST0
FADDP STn, ST0	STn += ST0, then pop	FMULP STn, ST0	STn *= ST0, then pop
FIADD mem	ST0 += (float)mem	FIMUL mem	ST0 = (float)mem*

Subtraction and Division instructions

In the following instructions, STn is an FPU register, src is an FPU register or a memory location of a dword or qword, and imem is a memory location of an integer dword or qword.

Subtraction instr.		Division instr.
FSUB src	ST0 -= src	FDIV src	ST0 /= src
FSUBR src	ST0 = src - ST0	FDIVR src	ST0 = src / ST0
FSUB STn, ST0	STn -= ST0	FDIV STn, ST0	STn /= ST0
FSUBR STn, ST0	STn = ST0 - STn	FDIVR STn, ST0	STn = ST0 / STn
FSUBP STn, ST0	STn -= ST0, then pop	FDIVP STn, ST0	STn /= ST0, then pop
FSUBRP STn, ST0	STn = ST0 - STn, then pop	FDIVRP STn, ST0	STn = ST0 / STn, then pop
FISUB imem	ST0 -= (float)imem	FIDIV imem	ST0 /= (float)imem
FISUBR imem	ST0 = (float)imem - ST0	FIDIVR imem	ST0 = (float)imem / ST0

Comparison instructions

In the following instructions, src is an FPU register or a memory location of a dword or qword, and imem is a memory location of an integer dword or qword.

Instr.	Comments
FCOM src	compare ST0 and src
FCOMP src	compare ST0 and src, then pop
FCOMPP	compare ST0 and ST1, then pop twice
FICOM imem	compare ST0 and (float)imem
FICOMP imem	compare ST0 and (float)imem, then pop
FTST	compare ST0 and 0

Starting from Pentium PRO the CPU also supports two new instructions that directly modify the CPU's FLAGS register.

Instr.	Comments
FCOMI STn	compare ST0 and STn
FCOMIP STn	compare ST0 and STn, then pop

The bits from the status register are transfered into FLAGS so that they are analogous to the result of a comparison of two unsigned numbers.

Miscellaneous instructions

Instr.	Comments
FCHS	ST0 = -ST0
FABS	ST0 = \|ST0\|
FSQRT	ST0 = sqrt(ST0)
FRNDINT	round ST0 to integer
FSCALE	ST0 = ST0 * 2^floor(ST1)

The FSCALE instruction is used for a quick multiplication of ST0 by a power of 2. The value of ST1 is not removed from fstack.