Programming the Floating Point Unit (FPU)
Hardware
The numeric coprocessor/FPU has eight 80-bit registers named ST0, ST1, ..., ST7. These registers are organized as a stack, however, one can also access any particular register in FPU operations. ST0 is the stack head.
| ST0 |
| ST1 |
|---|
| ST2 |
| ST3 |
| ST4 |
| ST5 |
| ST6 |
| ST7 |
An attempt to push more that 8 values into fstack results in an exception.
| Instr. | Comments |
|---|---|
| FINIT | initialize the FPU and clean fstack |
FPU has its own 16-bit status register. The comparison operations set particular bits of this register. However, FPU status register cannot be directly used for conditional jumps.
| Instr. | Comments |
|---|---|
| FSTSW mem | copy status reg to mem word |
| FSTSW AX | copy status reg to AX |
| SAHF | store AX into the FLAGS reg |
| LAHF | load the FLAGS reg in AX |
FPU also has a 16-bit control register. This register determines, for example, how to convert a floating-point number in FPU into integer representation. By default, the number is rounded-off to the nearest integer.
| Instr. | Comments |
|---|---|
| FSTCW mem | copy control reg to mem word |
| FLDCW mem | load a mem word into the control reg |
Loading and Storing instructions
In the following instructions, src can be a word, dword, qword, or an FPU register, and mem is a memory location of a word, dword, or qword.
| Loading instr. | Storing instr. | |||
|---|---|---|---|---|
| FLD src | push src into fstack | FST src | copy ST0 to src | |
| FILD mem | push an integer mem into fstack | FSTP src | copy ST0 to src, then pop | |
| FLD1 | push 1.0 into fstack | FIST mem | copy ST0 as int to mem | |
| FLDZ | push 0.0 into fstack | FISTP mem | copy ST0 as int to mem, then pop | |
| FLDPI | push pi (3.14...) into fstack | |||
| FLDL2E | push log2e into fstack | FXCH STn | exchange ST0 and STn | |
| FLDL2T | push log210 into fstack | FXCHP STn | exchange ST0 and STn, then pop | |
| FLDLG2 | push log102 into fstack | |||
| FLDLN2 | push loge2 into fstack | |||
It is easily seen that FXCHP STn instruction is equivalent to FSTP STn.
Addition and Multiplication instructions
In the following instructions, STn is an FPU register, mem is a memory location of a dword or qword, and src is an FPU register or a memory location of a dword or qword.
| Addition instr. | Multiplication instr. | |||
|---|---|---|---|---|
| FADD src | ST0 += src | FMUL src | ST0 *= src | |
| FADD STn, ST0 | STn += ST0 | FMUL STn, ST0 | STn *= ST0 | |
| FADDP STn, ST0 | STn += ST0, then pop | FMULP STn, ST0 | STn *= ST0, then pop | |
| FIADD mem | ST0 += (float)mem | FIMUL mem | ST0 *= (float)mem | |
Subtraction and Division instructions
In the following instructions, STn is an FPU register, src is an FPU register or a memory location of a dword or qword, and imem is a memory location of an integer dword or qword.
| Subtraction instr. | Division instr. | |||
|---|---|---|---|---|
| FSUB src | ST0 -= src | FDIV src | ST0 /= src | |
| FSUBR src | ST0 = src - ST0 | FDIVR src | ST0 = src / ST0 | |
| FSUB STn, ST0 | STn -= ST0 | FDIV STn, ST0 | STn /= ST0 | |
| FSUBR STn, ST0 | STn = ST0 - STn | FDIVR STn, ST0 | STn = ST0 / STn | |
| FSUBP STn, ST0 | STn -= ST0, then pop | FDIVP STn, ST0 | STn /= ST0, then pop | |
| FSUBRP STn, ST0 | STn = ST0 - STn, then pop | FDIVRP STn, ST0 | STn = ST0 / STn, then pop | |
| FISUB imem | ST0 -= (float)imem | FIDIV imem | ST0 /= (float)imem | |
| FISUBR imem | ST0 = (float)imem - ST0 | FIDIVR imem | ST0 = (float)imem / ST0 | |
Comparison instructions
In the following instructions, src is an FPU register or a memory location of a dword or qword, and imem is a memory location of an integer dword or qword.
| Instr. | Comments |
|---|---|
| FCOM src | compare ST0 and src |
| FCOMP src | compare ST0 and src, then pop |
| FCOMPP | compare ST0 and ST1, then pop twice |
| FICOM imem | compare ST0 and (float)imem |
| FICOMP imem | compare ST0 and (float)imem, then pop |
| FTST | compare ST0 and 0 |
Starting from Pentium PRO the CPU also supports two new instructions that directly modify the CPU's FLAGS register.
| Instr. | Comments |
|---|---|
| FCOMI STn | compare ST0 and STn |
| FCOMIP STn | compare ST0 and STn, then pop |
The bits from the status register are transfered into FLAGS so that they are analogous to the result of a comparison of two unsigned numbers.
Miscellaneous instructions
| Instr. | Comments |
|---|---|
| FCHS | ST0 = -ST0 |
| FABS | ST0 = |ST0| |
| FSQRT | ST0 = sqrt(ST0) |
| FRNDINT | round ST0 to integer |
| FSCALE | ST0 = ST0 * 2floor(ST1) |
The FSCALE instruction is used for a quick multiplication of ST0 by a power of 2. The value of ST1 is not removed from fstack.