Absolute and Relative Error

In the Numbers étude, many students failed to grasp either that the model code was computing relative errors or the practical significance of the relative errors obtained, given that the model code uses single-precision floats.

Recall that IEEE single-precision floating point numbers are
S: 1E: 8M: 23

where S is the sign bit, E is the exponent field, and M is the significand aligned so that its most significant 1 bit is just off the edge of what's stored. (This is called a “hidden bit” and means that the significand is effectively 24 bits.)

There are two very special values for the exponent field.

all bits 0.
This is the smallest possible exponent. These numbers represent (-1)S×M×2-149. When M = 0, this is zero, and yes, IEEE arithmetic distinguishes between +0.0 and -0.0. When M > 0, these numbers are called “subnormal” because they are smaller (in magnitude) than the smallest normalised number.

If you try to represent a positive number less than or equal to 2-150 it will be rounded to 0.0. This is called underflow.

all bits 1.
This is the largest possible exponent. These numbers represent ∞ (S=0) or -∞ (S=1) if M = 0; if M > 0 they are ``Not-a-Number'' values.

An ∞ result indicates a result that was too big to fit (overflow); NaN results indicate some sort of error, such as sqrt(-1).

all other values of E are not special.
These numbers are called “normalised” numbers because the significand is scaled to put the most significant 1 bit in a fixed place. They represent (-1)S×(223+M)×2E-149. Zero, subnormal numbers, and normalised numbers are collectively called finite numbers.

Absolute error

Suppose we have a true value T and a calculated or otherwise estimated value E. Just how wrong is E? One answer is the absolute error:

abserr(E,T) = abs(E-T)

This is a good way to measure error if you expect the sizes of errors not to vary much with the size of the true values, or when the range of possible true values is not great. For example, if you can measure temperatures to ±0.2K (which is way better than almost all weather measurements) then since temperatures typically range from 233K to 313K in places people are likely to care about, absolute error is a good quality measure. (See this article for accuracy of clinical thermometers. 0.2K wasn't picked out of thin air.)

Relative error

Supposing T to be non-zero, we can define the relative error:

relerr(E,T) = abs((E-T)/T)

That is, the absolute error scaled by the size of T. Let's return to our temperature example. Real measuring instruments tend to get worse at the ends of their ranges. Let's use some actual numbers from a Texas Instruments data sheet.

The absolute error isn't even close to constant.

Except in the middle of the range (where it is worse), the relative error is close to 1 part in 300. The typical error in the middle of the range is quoted as ±0.05°C, which is close to the worst case relative error elsewhere, which is reasonable.

The next number in that data sheet is worth noting: two adjacent sensors on the same tape, manufactured just instants apart, could differ by ±0.1°C. We see that two “sibling” sensors placed in different circuits measuring close but different places could disagree by ±0.2C and that would not mean there was any real difference at all. Again, the 0.2K figure wasn't pulled out of thin air! Next time someone tells you we know global temperatures to better than ±1°C, laugh yourself sick.

Because physical measurement errors tend to grow with the errors, floating-point, which has representation errors that grow with the represented value, aren't a bad fit. Single precision floating point numbers, for example, are much better than we need to record temperatures. (Even “ultra-precise” measurements taken with extremely expensive laboratory equipment.)

Of course calculations introduce their own errors. This makes it useful to have a feel for how much error is unavoidable, just because of the way numbers are represented.

A demo program

The following C program shows the absolute and relative errors for adjacent single precision floats. That is, we are concerned here with numbers that are as close to each other as they can possibly be without being the same number.

#include <float.h>
#include <math.h>
#include <stdio.h>

static float abserr(float derived, float correct) {
    return fabsf(derived - correct);

static float relerr(float derived, float correct) {
    return fabsf((derived - correct)/correct);

static void show(char const *label, float derived, float correct) {
    printf("%s  %.1e  %.1e\n", label,
        abserr(derived, correct), relerr(derived, correct));

int main(void) {
    union { float f; unsigned u; } pun;
    float const m = powf(2.0f, -24);

    printf("          absolute relative\naround    error    error\n");

    pun.f = FLT_MIN;
    show("FLT_MIN+", pun.f, FLT_MIN);

    pun.f = m;
    show("6.0e-8+ ", pun.f, m);

    pun.f = 1.0f;
    show("1.0-    ", pun.f, 1.0f);

    pun.f = 1.0f;
    show("1.0+    ", pun.f, 1.0f);

    pun.f = 1.0f/m;
    show("1.7e+7- ", pun.f, 1.0f/m);

    pun.f = FLT_MAX;
    show("FLT_MAX-", pun.f, FLT_MAX);

    return 0;

Here's the output of that program.
FLT_MIN+ 1.4e-45 1.2e-07
6.0e-8+ 7.1e-15 1.2e-07
1.0- 6.0e-08 6.0e-08
1.0+ 1.2e-07 1.2e-07
1.7e+7- 1.0e+00 6.0e-08
FLT_MAX- 2.0e+31 6.0e-08

We see that the absolute errors grow with the numbers, while the relative errors fluctate between 1.2e-7 and half that. In fact the official figure is

FLT_EPSILON = 1.1920928955078125×10-7

What does this mean?

If the relative error of a single precision result is a small multiple of 1.2e-7, that means it is about as good as you are likely to get. (The C <math.h> library and java.lang.Math class go to great lengths to get 1 bit worst case error. You aren't likely to write code that accurate, and neither am I.) It's certainly much smaller than the accuracy of most physical measurements.

If the relative error of a calculation is 1.0, that typically means that the result underflowed to 0.0.

If the relative error of a calculation is ∞, that typically means that the result overflowed.

If the relative error is NaN, that means that somewhere in the calculation something stupid happened.

Of course, only in carefully constructed test cases are we likely to know what the true value T is. If we have two different calculations for the same result, we can compute their absolute and relative errors. It is common to use

relerr(E,F) = abs(E-F)/max(abs(E),abs(F))

in this case. A large relative error in such a case indicates trouble. A small relative error may simply mean that both are wrong.