Next: Floating-Point Environment, Previous: Fixnum Operations, Up: Fixnum and Flonum Operations [Contents][Index]
A flonum is an inexact real number that is implemented as a
floating-point number. In MIT/GNU Scheme, all inexact real numbers are
flonums. For this reason, constants such as 0.
and 2.3
are guaranteed to be flonums.
MIT/GNU Scheme follows the IEEE 754-2008 floating-point standard, using binary64 arithmetic for flonums. All floating-point values are classified into:
Numbers of the form
r^e (1 + f/r^p)
where r, the radix, is a positive integer, here always 2; p, the precision, is a positive integer, here always 53; e, the exponent, is an integer within a limited range, here always -1022 to 1023 (inclusive); and f, the fractional part of the significand, is a (p-1)-bit unsigned integer,
Fixed-point numbers near zero that allow for gradual underflow. Every subnormal number is an integer multiple of the smallest subnormal number. Subnormals were also historically called “denormal”.
There are two distinguished zero values, one with “negative” sign bit and one with “positive” sign bit.
The two zero values are considered numerically equal, but serve to distinguish paths converging to zero along different branch cuts and so some operations yield different results for differently signed zero values.
There are two distinguished infinity values, negative infinity or
-inf.0
and positive infinity or +inf.0
, representing
overflow on the real line.
There are 4 r^{p-2} - 2 distinguished not-a-number values, representing invalid operations or uninitialized data, distinguished by their negative/positive sign bit, a quiet/signalling bit, and a (p-2)-digit unsigned integer payload which must not be zero for signalling NaNs.
Arithmetic on quiet NaNs propagates them without raising any
floating-point exceptions.
In contrast, arithmetic on signalling NaNs raises the
floating-point invalid-operation exception.
Quiet NaNs are written +nan.123
, -nan.0
, etc.
Signalling NaNs are written +snan.123
, -snan.1
, etc.
The notation +snan.0
and -snan.0
is not allowed: what
would be the encoding for them actually means +inf.0
and
-inf.0
.
Returns #t
if object is a flonum; otherwise returns #f
.
These procedures are the standard order and equality predicates on flonums. When compiled, they do not check the types of their arguments. These predicates raise floating-point invalid-operation exceptions on NaN arguments; in other words, they are “ordered comparisons”. When floating-point exception traps are disabled, they return false when any argument is NaN.
Every pair of floating-point numbers — excluding NaN — exhibits
ordered trichotomy: they are related either by flo:=
,
flo:<
, or flo:>
.
These procedures are the standard order and equality predicates on
flonums. When compiled, they do not check the types of their arguments.
These predicates do not raise floating-point exceptions, and simply
return false on NaN arguments, except flo:unordered?
which
returns true iff at least one argument is NaN; in other words, they
are “unordered comparisons”.
Every pair of floating-point values — including NaN — exhibits
unordered tetrachotomy: they are related either by flo:safe=
,
flo:safe<
, flo:safe>
, or flo:unordered?
.
Each of these procedures compares its argument to zero. When compiled, they do not check the type of their argument. These predicates raise floating-point invalid-operation exceptions on NaN arguments; in other words, they are “ordered comparisons”.
(flo:zero? -0.) ⇒ #t
(flo:negative? -0.) ⇒ #f
(flo:negative? -1.) ⇒ #t
(flo:zero? 0.) ⇒ #t
(flo:positive? 0.) ⇒ #f
(flo:positive? 1.) ⇒ #f
(flo:zero? +nan.123) ⇒ #f ; (raises invalid-operation)
Floating-point classification predicates. For any flonum, exactly one of these predicates returns true. These predicates never raise floating-point exceptions.
(flo:normal? 1.23) ⇒ #t (flo:subnormal? 4e-124) ⇒ #t (flo:safe-zero? -0.) ⇒ #t (flo:infinite? +inf.0) ⇒ #t (flo:nan? -nan.123) ⇒ #t
Equivalent to:
(or (flo:safe-zero? flonum) (flo:subnormal? flonum) (flo:normal? flonum)) ; or (and (not (flo:infinite? flonum)) (not (flo:nan? flonum)))
True for normal, subnormal, and zero floating-point values; false for infinity and NaN.
Returns a symbol representing the classification of the flonum, one
of normal
, subnormal
, zero
, infinity
, or
nan
.
Returns true if the sign bit of flonum is negative, and false otherwise. Never raises a floating-point exception.
(flo:sign-negative? +0.) ⇒ #f
(flo:sign-negative? -0.) ⇒ #t
(flo:sign-negative? -1.) ⇒ #t
(flo:sign-negative? +inf.0) ⇒ #f
(flo:sign-negative? +nan.123) ⇒ #f
(flo:negative? -0.) ⇒ #f
(flo:negative? +nan.123) ⇒ #f ; (raises invalid-operation)
These procedures are the standard arithmetic operations on flonums. When compiled, they do not check the types of their arguments.
Fused multiply-add:
(flo:*+ u v a)
computes uv+a correctly
rounded, with no intermediate overflow or underflow arising from
uv.
In contrast, (flo:+ (flo:* u v) a)
may have
two rounding errors, and can overflow or underflow if uv is too
large or too small even if uv + a is normal.
Flo:fma
is an alias for flo:*+
with the more familiar
name used in other languages like C.
Flo:fast-fma?
returns true if the implementation of fused
multiply-add is supported by fast hardware, and false if it is
emulated using Dekker’s double-precision algorithm in software.
(flo:+ (flo:* 1.2e100 2e208) -1.4e308)
⇒ +inf.0 ; (raises overflow)
(flo:*+ 1.2e100 2e208 -1.4e308)
⇒ 1e308
This procedure returns the negation of its argument. When compiled, it does not check the type of its argument.
This is not equivalent to (flo:- 0. flonum)
:
(flo:negate 1.2) ⇒ -1.2 (flo:negate -nan.123) ⇒ +nan.123 (flo:negate +inf.0) ⇒ -inf.0 (flo:negate 0.) ⇒ -0. (flo:negate -0.) ⇒ 0. (flo:- 0. 1.2) ⇒ -1.2 (flo:- 0. -nan.123) ⇒ -nan.123 (flo:- 0. +inf.0) ⇒ -inf.0 (flo:- 0. 0.) ⇒ 0. (flo:- 0. -0.) ⇒ 0.
These procedures are flonum versions of the corresponding procedures. When compiled, they do not check the types of their arguments.
Flonum versions of expm1
and log1p
with restricted
domains: flo:expm1
is defined only on inputs bounded below
log(2)
in magnitude, and flo:log1p
is defined only on inputs bounded
below
1 - sqrt(1/2)
in magnitude.
Callers must use (- (flo:exp x) 1)
or (flo:log (+ 1 x))
outside these ranges.
This is the flonum version of atan
with two arguments. When
compiled, it does not check the types of its arguments.
Returns two values,
m = log(|Gamma(x)|) and s = sign(Gamma(x)),
respectively a flonum and an exact integer either -1
or
1
, so that
Gamma(x) = s * e^m.
Returns the min or max of two floating-point numbers. If either argument is NaN, raises the floating-point invalid-operation exception and returns the other one if it is not NaN, or the first argument if they are both NaN.
Returns the argument that has the smallest or largest magnitude, as in minNumMag or maxNumMag of IEEE 754-2008. If either argument is NaN, raises the floating-point invalid-operation exception and returns the other one if it is not NaN, or the first argument if they are both NaN.
Flo:ldexp
scales by a power of two; flo:scalbn
scales by
a power of the floating-point radix.
ldexp x e := x * 2^e, scalbn x e := x * r^e.
In MIT/GNU Scheme, these procedures are the same; they are both provided to make it clearer which operation is meant.
For nonzero finite x, returns floor(log(x)/log(r)) as an exact integer, where r is the floating-point radix.
For all other inputs, raises invalid-operation and returns #f
.
Returns the next floating-point number after x1 in the direction of x2.
(flo:nextafter 0. -1.) ⇒ -4.9406564584124654e-324
Returns a floating-point number with the magnitude of x1 and the sign of x2.
(flo:copysign 123. 456.) ⇒ 123. (flo:copysign +inf.0 -1) ⇒ -inf.0 (flo:copysign 0. -1) ⇒ -0. (flo:copysign -0. 0.) ⇒ 0. (flo:copysign -nan.123 0.) ⇒ +nan.123
Floating-point system parameters.
Flo:radix
is the floating-point radix as an integer, and
flo:precision
is the floating-point precision as an integer;
flo:radix.
is the flotaing-point radix as a flonum.
Flo:error-bound
, sometimes called the machine epsilon, is the
maximum relative error of rounding to nearest:
max |x - fl(x)|/|x| = 1/(2 r^(p-1)),
where r is the floating-point radix and p is the floating-point precision.
Flo:ulp-of-one
is the distance from 1 to the next larger
floating-point number, and is equal to 1/r^{p-1}.
Flo:error-bound
is half flo:ulp-of-one
.
Flo:log-error-bound
is the logarithm of flo:error-bound
,
and flo:log-ulp-of-one
is the logarithm of
flo:log-ulp-of-one
.
Returns the distance from flonum to the next floating-point number larger in magnitude with the same sign. For zero, this returns the smallest subnormal. For infinities, this returns positive infinity. For NaN, this returns the same NaN.
(flo:ulp 1.) ⇒ 2.220446049250313e-16 (= (flo:ulp 1.) flo:ulp-of-one) ⇒ #t
Largest and smallest positive integer exponents of the radix in normal and subnormal floating-point numbers.
Flo:normal-exponent-max
is the largest positive integer such
that (expt flo:radix. flo:normal-exponent-max)
does not
overflow.
Flo:normal-exponent-min
is the smallest positive integer such
that (expt flo:radix. flo:normal-exponent-min)
is a normal
floating-point number.
Flo:subnormal-exponent-min
is the smallest positive integer such
that (expt flo:radix. flo:subnormal-exponent-min)
is nonzero;
this is also the smallest positive floating-point number.
Smallest and largest normal and subnormal numbers in magnitude.
Least and greatest exponents of normal and subnormal floating-point
numbers, as floating-point numbers.
For example, flo:greatest-normal-exponent-base-2
is the
greatest floating-point number such that (expt
2. flo:greatest-normal-exponent-base-2)
does not overflow and is a
normal floating-point number.
These procedures implement the IEEE 754-2008 total ordering
on floating-point values and their magnitudes.
Here the “magnitude” of a floating-point value is a floating-point
value with positive sign bit and everything else the same; e.g.,
+nan.123
is the “magnitude” of -nan.123
and 0.0
is the “magnitude” of -0.0
.
The total ordering has little to no numerical meaning and should be used only when an arbitrary choice of total ordering is required for some non-numerical reason.
Flo:total<
returns true if x1 precedes x2.
Flo:total-mag<
returns true if the magnitude of x1
precedes the magnitude of x2.
Flo:total-order
returns -1 if x1 precedes
x2, 0 if they are the same floating-point value
(including sign of zero, or sign and payload of NaN), and +1 if
x1 follows x2.
Flo:total-order-mag
returns -1 if the magnitude of
x1 precedes the magnitude of x2, etc.
Flo:make-nan
creates a NaN given the sign bit, quiet bit, and
payload.
Negative? and quiet? must be booleans, and payload
must be an unsigned (p-2)-bit integer, where p is the
floating-point precision.
If quiet? is false, payload must be nonzero.
(flo:sign-negative? (flo:make-nan negative? quiet? payload)) ⇒ negative? (flo:nan-quiet? (flo:make-nan negative? quiet? payload)) ⇒ quiet? (flo:nan-payload (flo:make-nan negative? quiet? payload)) ⇒ payload (flo:make-nan #t #f 42) ⇒ -snan.42 (flo:sign-negative? +nan.123) ⇒ #f (flo:quiet? +nan.123) ⇒ #t (flo:payload +nan.123) ⇒ 123
Next: Floating-Point Environment, Previous: Fixnum Operations, Up: Fixnum and Flonum Operations [Contents][Index]