Skip to main content

Unit 6.2.3 Models of floating point computation

When computing with floating point numbers on a target computer, we will assume that all (floating point) arithmetic that is performed is in terms of additions, subtractions, multiplications, and divisions: \(\left\{ +, -, \times, / \right\}\text{.}\)

Subsubsection Notation

In our discussions, we will distinguish between exact and computed quantities. The function \(\fl{ \mbox{expression} }\) returns the result of the evaluation of \(\mbox{expression} \text{,}\) where every operation is executed in floating point arithmetic. For example, given \(\chi, \psi, \zeta, \omega \in F \) and assuming that the expressions are evaluated from left to right and order of operations is obeyed,

\begin{equation*} \fl{ \chi + \psi + \zeta/\omega } \end{equation*}

is equivalent to

\begin{equation*} \fl{ \fl{ \chi + \psi } + \fl{ \zeta / \omega } } \text{.} \end{equation*}

Equality between the quantities \(\mbox{lhs} \) and \(\mbox{rhs} \) is denoted by \(lhs = rhs\text{.}\) Assignment of \(rhs \) to \(lhs \) is denoted by \(lhs \becomes rhs\) (\(lhs \) becomes \(rhs\)). In the context of a program, the statements \(lhs := rhs\) and \(lhs := \fl{ \mbox{rhs} }\) are equivalent. Given an assignment

\begin{equation*} \kappa := \mbox{expression}, \end{equation*}

we use the notation \(\check \kappa\) (pronounced "check kappa") to denote the quantity resulting from \(\fl{ \mbox{expression} }\text{,}\) which is actually stored in the variable \(\kappa\text{:}\)

\begin{equation*} \check \kappa = \fl{expression}. \end{equation*}

In future discussion, we will use the notation \([ \cdot ] \) as shorthand for \(\fl{ \cdot } \text{.}\)

Subsubsection Standard Computational Model (SCM)

The Standard Computational Model (SCM) assumes that, for any two floating point numbers \(\chi\) and \(\psi\text{,}\) the basic arithmetic operations satisfy the equality

\begin{equation*} \fl{ \chi\ \mbox{op}\ \psi } = (\chi\ \mbox{op}\ \psi)( 1 + \epsilon), \vert \epsilon \vert \leq \meps, \ {\rm and\ \ op}\ \in \{+, -, *, /\}. \end{equation*}

The quantity \(\epsilon\) is a function of \(\chi, \psi\) and \({\rm op}\text{.}\) Sometimes we add a subscript \((\epsilon_+, \epsilon_*, \cdots)\) to indicate what operation generated the \((1 + \epsilon)\) error factor. We always assume that all the input variables to an operation are floating point numbers.


We can interpret the SCM as follows: These operations are performed exactly and it is only in storing the result that a roundoff error occurs.

What really happens is that enough digits of the result are computed so that the net effect is as if the result of the exact operation was stored.

Given \(\chi, \psi \in F \text{,}\) performing any operation \({\rm op} \in \{ +, -, *, / \} \) with \(\chi \) and \(\psi \) in floating point arithmetic, \(\fl{ \chi\ {\rm op}\ \psi } \) yields a result that is correct up to machine precision: Let \(\zeta = \chi\ {\rm op}\ \psi \) and \(\check \zeta = \zeta + \delta\!\zeta = \fl{ \chi\ {\rm op}\ \psi } \text{.}\) Then \(\vert \delta\!\zeta \vert \leq \meps \vert \zeta \vert \) and hence \(\check \zeta \) is close to \(\zeta \) (it has \(k \) correct binary digits).


Consider the operation

\begin{equation*} \kappa = 4 / 3, \end{equation*}

where we notice that both \(4 \) and \(3 \) can be exactly represented in our floating point system with \(\beta = 2 \) and \(t = 4 \text{.}\) Recall that the real number \(4/3 = 1.3333\cdots \) is stored as \(.1010 \times 2^1, \) if \(t = 4 \) and truncation is employed. This equals \(1.25 \) in decimal representation. The relative error was \(0.0625 \text{.}\) Now

\begin{equation*} \begin{array}{l} \check \kappa \\ ~~~ = ~~~~ \\ \fl{ 4 / 3 } \\ ~~~ = ~~~~ \\ 1.25 \\ ~~~ = ~~~~ \\ 1.333\cdots + (-0.0833\cdots) \\ ~~~ = ~~~~ \\ 1.333\cdots \times \left( 1 + \frac{-0.08333\cdots}{1.333\cdots} \right) \\ ~~~ = ~~~~ \\ 4/3 \times ( 1 + (-0.0625)) \\ ~~~ = ~~~~ \\ \kappa( 1 + \epsilon_/ ), \end{array} \end{equation*}


\begin{equation*} \vert \epsilon_{/} \vert = 0.0625 \leq \begin{array}[t]{c} \underbrace{ 0.125 } \\ \meps = 2^{-(t-1)} \end{array}. \end{equation*}

Subsubsection Alternative Computational Model (ACM)

For certain problems it is convenient to use the Alternative Computational Model (ACM) [21], which also assumes for the basic arithmetic operations that

\begin{equation*} \fl{ \chi\ {\rm op}\ \psi } = \frac{\chi\ {\rm op}\ \psi}{1 + \epsilon}, \vert \epsilon \vert \leq \meps, \ {\rm and\ \ op}\ \in \{+, -, *, /\}. \end{equation*}

As for the standard computation model, the quantity \(\epsilon\) is a function of \(\chi, \psi\) and \(\mbox{\rm op}\text{.}\) Note that the \(\epsilon\)'s produced using the standard and alternative models are generally not equal. The Taylor series expansion of \(1/(1+\epsilon) \) is given by

\begin{equation*} \frac{1}{1+\epsilon} = 1 + (- \epsilon) + O( \epsilon^2 ), \end{equation*}

which explains how the SCM and ACM are related.

The ACM is useful when analyzing algorithms that involve division. In this course, we don't analyze in detail any such algorithms. We include this discussion of ACM for completeness.


Sometimes it is more convenient to use the SCM and sometimes the ACM. Trial and error, and eventually experience, will determine which one to use.