Floating point numbers are a subset of the real numbers. Each has a built-in radix point (or ``decimal point'') that shifts, or ``floats'', as the value of the number changes. It consists of the following: one sign bit, which indicates whether the number is positive or negative; bits that encode the number's _e_x_p_o_n_e_n_t; and bits that encode the number's _f_r_a_c_t_i_o_n, or the number upon which the exponent works. In general, the magnitude of the number encoded depends upon the number of bits in the exponent, whereas its precision depends upon the number of bits in the fraction.
The ranges of values that can be held by a COHERENT float are set in header file float.h.
The exponent often uses a bias. This is a value that is subtracted from the exponent to yield the power of two by which the fraction will be increased.
Floating point numbers come in two levels of precision: single precision, called floats; and double precision, called doubles. With most microprocessors, sizeof(float) returns four, which indicates that it is four chars (bytes) long, and sizeof(double) returns eight.
Several formats are used to encode floats, including IEEE, DECVAX, and BCD (binary coded decimal).
The following describes DECVAX, IEEE, and BCD formats, for your information.
=============
| seee eeee |Byte 4
|===========|
| efff ffff |Byte 3
|===========|
| ffff ffff |Byte 2
|===========|
| ffff ffff |Byte 1
=============
The exponent has a bias of 129.
If the sign bit is set to one, the number is negative; if it is set to zero, then the number is positive. If the number is all zeroes, then it equals zero; an exponent and fraction of zero plus a sign of one (``negative zero'') is by definition not a number. All other forms are numeric values.
The most significant bit in the fraction is always set to one and is not stored. It is usually called the ``hidden bit''.
The format for doubles simply adds another 32 fraction bits to the end of the float representation, as follows:
=============
| seee eeee |Byte 8
|===========|
| efff ffff |Byte 7
|===========|
| ffff ffff |Byte 6
|===========|
| ffff ffff |Byte 5
|===========|
| ffff ffff |Byte 4
|===========|
| ffff ffff |Byte 3
|===========|
| ffff ffff |Byte 2
|===========|
| ffff ffff |Byte 1
=============
Unlike the DECVAX format, IEEE format assigns special values to several floating point numbers. Note that in the following description, a tiny exponent is one that is all zeroes, and a huge exponent is one that is all ones:
=============
| seee eeee | Byte 8
|===========|
| eeee ffff | Byte 7
|===========|
| ffff ffff | Byte 6
|===========|
| ffff ffff | Byte 5
|===========|
| ffff ffff | Byte 4
|===========|
| ffff ffff | Byte 3
|===========|
| ffff ffff | Byte 2
|===========|
| ffff ffff | Byte 1
=============
The exponent has a bias of 1,023. The rules of encoding are the same as for floats.
A BCD float has a sign bit, seven bits of exponent, and six four-bit digits. In the following diagrams, `d' indicates ``digit'':
=============
| seee eeee | Byte 4
|===========|
| dddd dddd | Byte 3
|===========|
| dddd dddd | Byte 2
|===========|
| dddd dddd | Byte 1
=============
A BCD double has a sign bit, 11 bits of exponent, and 13 four-bit digits, as follows:
=============
| seee eeee | Byte 8
|===========|
| eeee dddd | Byte 7
|===========|
| dddd dddd | Byte 6
|===========|
| dddd dddd | Byte 5
|===========|
| dddd dddd | Byte 4
|===========|
| dddd dddd | Byte 3
|===========|
| dddd dddd | Byte 2
|===========|
| dddd dddd | Byte 1
=============
Passing the hexadecimal numbers A through F in a digit yields unpredictable results.
The following rules apply when handling BCD numbers:
To allow you to convert binary data from one floating-point format to another, COHERENT comes with four functions with which you can convert DECVAX-format floating-point numbers to IEEE format, and vice versa. They are as follows:
#ifdef _DECVAX
...
#elif _IEEE
...
#endif
A The C preprocessor under each edition of COHERENT will ensure that the correct code is included for compilation.