This article is about lazy evaluation, an evaluation strategy, an operational semantics. This means lazy evaluation specifies the steps taken by a computer to do things. Lazy evaluation is a popular way to implement most of Haskell, but not the only way.
Haskell itself is not lazy; it only specifies non-strictness, not what steps a computer takes, for most things. Non-strictness is a word from denotational semantics, which is only about relating expressions to values. Non-strictness means for example
const True ⊥ = True
(⊥ is a theoretical value to stand for “no information” or “no answer”. I cannot say “non-termination” because that would require talking about steps.) In other words, non-strictness means that an expression can have an answer even if some of its arguments have no answer.
Denotational semantics says whether you get an answer and what answer, but does not explain or control the amount of time and memory used to get it. This is when you need the operational semantics given by compilers and interpreters; fortunately, most of them give lazy evaluation, coarsely speaking. Lazy evaluation suffices to explain asymptotic complexities such as why some code uses Θ(n) memory and why an alternative uses Θ(1) memory. Even such a coarse understanding without real cycle counts and byte counts already solves a lot of mysteries because most programmers do not even get this far.
If actual cycle counts and byte counts matter, if you need to explain why some code uses 40 MB memory and why an alternative uses just 10 MB, then lazy evaluation does not suffice, and you need the low-level operational semantics of your compiler or interpreter, including code optimizations. This is outside the scope of this article. But seeing that the low-level operational semantics is a refinement of lazy evaluation, you should probably know lazy evaluation first.
Knowing lazy evaluation does not help explain what answer you get, if that
is all you want regardless of complexity. It can be done, and most people
do it (out of poor education), but it is too low-level and tedious,
missing the forest for the trees, nay, tree leaves. Denotational semantics
is better, but still not high-level enough. Axiomatic semantics combined with
program derivation is the most suitable, and I have triumphant examples of
the axiomatic, derivational style:
scanl
examples,
fix
examples,
a primitive recursion example.
The main points of lazy evaluation are pretty straightforward, just unfamiliar: short-circuit evaluation, and sharing the result (if the thing evaluated has multiple parents). This is of course vague; I illustrate below and fully clarify in subsequent sections.
To illustrate short-circuit evaluation, I evaluate (\c _ -> c) True (3*4+3*4)
:
(\c _ -> c) True (3*4+3*4)
True
In particular the expression 3*4+3*4
is left alone because
the rule for function application doesn't evaluate it.
A longer example, to show what pattern-matching evaluates:
case (\c _ -> c) (Just (3*4+3*4)) 4 of { Just _ -> True; Nothing -> False }
(\c _ -> c) (Just (3*4+3*4)) 4
Just (3*4+3*4)
case Just (3*4+3*4) of { Just _ -> True; Nothing -> False }
True
The sub-expression is evaluated only as far as to tell which pattern to
match; 3*4+3*4
is again unevaluated.
But sometimes we are also interested in evaluating 3*4+3*4
,
for example if we expressedly want it done, or if the pattern above
were Just 0
. Here is how.
The operators (+)
and (*)
for built-in
numeric types (Int, Integer, Float, Double) are eager, not lazy: to evaluate
e+f
(say),
both e
and f
are evaluated first. Let us also pick
the evaluation order so that we evaluate e
, then f
,
then get back to finish e+f
; similarly
for e*f
.
I now evaluate 3*4+3*4
:
3*4+3*4
3*4
12
12+3*4
3*4
12
12+12
24
Now I illustrate sharing. I evaluate (\x->x+x) (3*4)
:
(\x->x+x) (3*4)
3*4
12
24
By mentioning the same formal argument x
multiple times,
lazy evaluation requires sharing the actual argument 3*4
as well as everything that happens to it subsequently, such as it becoming
12
.
The same sharing applies to bindings in pattern-matching, let
,
where
, and top-level definitions. (For example,
let x=3*4 in x+x
is almost identical to the above.)
However, sharing does not apply to 3*4+3*4
per se,
unless the compiler applies an optimization called
common-subexpression elimination.
This whirlwind tour gives a glimpse of lazy evaluation but leaves the precise rules unsaid. These are spelled out in subsequent sections.
The notation I use to depict expressions is designed to make explicit what is shared (and what is not shared); this is very important because it makes the whole world of difference in lazy evaluation. The notation also happens to be close to how most compilers and interpreters represent expressions.
A typical expression consists of mostly function applications and
constructors, like
let x=blah in f x : g x
. In the extreme, I may draw a graph in which a node
stands for each function application or constructor, with explicit pointers to functions
and arguments, like this:
This extreme takes a lot of space and a lot of work. More often, I like
to merge most nodes into one, drawing separate nodes for shared
things only (in this example blah
), like this:
Furthermore, I don't draw separate nodes for lambda-bound variables, though
they may occur multiple times. So for example,
I leave (\x -> f x : g x)
as it is.
Lambda-bound variables represent holes to be filled in, rather than pointers
to more nodes.
I call these Inlined Expression Graphs: they are like expression graphs, but I inline (merge) unshared nodes to save space and work.
This notation is similar to that in the book Introduction to Functional Programming using Haskell, 2nd edition, by Richard Bird. I add boxes around nodes; I also have slightly more separate nodes.
There is a text-only notation in the paper A Natural Semantics for Lazy
Evaluation, by John Launchbury. It is like using a huge let-expression
to represent an expression graph, except that it does away with the keywords
“let” and “in”. It binds names to nodes (except the root), so pointers
can be serialized as target node names. It remains to write down the binding
of names to node contents, and to distinguish the root. So for example,
let x=3*4 in x+x
is written as:
{x↦t*f, t↦3, f↦4} : x+x
(It uses the convention that every argument must be a separate node.)
Likewise, the book The Haskell School of Expression, by Paul Hudak, uses a text-only notation based on where-clauses and bindings when sharing is important, e.g., x+x where x=3*4. I used to use this notation when contributing to the Haskell Wiki a long time ago.
I now decide against a text-only notation because I think that most readers like to see pictures when sharing happens.
I write evaluation steps from start to end like this:
Sometimes, evaluation requires zooming into a sub-expression and processing it, before zooming out and continuing with the parent expression. I indicate this by an indented, nested evaluation, like this:
This format is a fragment of Structured Derivation by Ralph-Johan Back. The most important feature I use here is nesting, which is what “structured” refers to. Most other presentations are unnested (including Bird's, Hudak's, all webpages before this one; fortunately, Launchbury's has nesting), i.e.,
A nested presentation is superior in several ways. It separates the two concerns—which rule triggers zooming into blah, which rule transforms blah—and explains them separately. It cuts the bloat and confusion of copying the parent expression over and over again when the sub-expression takes many steps to process. It is closer to what most compilers and interpreters do: when zooming in, one stack unit is reserved; when zooming out, one stack unit is freed. Thus, nesting depth is stack depth, and it completely explains stack overflows.
The evaluation rule for function application: evaluate the function until it is a lambda term, then plug in the arguments. Multiple occurrences of a formal argument must all point to the same actual argument node (sharing).
Example: (\c -> (c, 0)) blah
(\c -> (c, 0)) blah
(blah, 0)
Example: (\c -> (c, c)) blah
(\c -> (c, c)) blah
The above examples are easy because of two simplifications: the function is already a lambda term, and there is only one argument. The following example shows the first complication, where the function needs its own processing first; it uses an evaluation rule covered later.
Example: (if True then (\c -> (c, 0)) else (\c -> (c, 1))) blah
(if True then (\c -> (c, 0)) else (\c -> (c, 1))) blah
(if True then (\c -> (c, 0)) else (\c -> (c, 1)))
(\c -> (c, 0))
(\c -> (c, 0)) blah
(blah, 0)
The second complication is about “multiple” arguments like
f x y
. Technically, this is (f x) y
, i.e.,
a function application with one argument y
, but the
function is yet another function application.
Example: (\c _ -> c) True (3*4+3*4)
(\c _ -> c) True (3*4+3*4)
(\c _ -> c) True
(\_ -> True)
(\_ -> True) (3*4+3*4)
True
It is a hassle to add N-1 nesting levels for what can be casually read as 1 function and N arguments. In practice, I will shorten the evaluation steps to plugging in all N arguments at once. This shortcut is safe for asymptotic complexity because the Haskell type system limits the number of arguments below program size. So this shortcut does not mistake a Θ(n) computation for a Θ(1) computation; it only hides a constant factor. This shortcut is also taken by most compilers and interpreters.
Example: (\c _ -> c) True (3*4+3*4)
(\c _ -> c) True (3*4+3*4)
True
If the function is a name with a definition elsewhere, I have two choices: I could spend a step turning the name into the corresponding lambda term, and then plug in the arguments; or I could just jump to plugging in the arguments. So here is another shortcut: I will jump to plugging in the arguments. It is less clutter, especially when later I cover functions defined by pattern matching. It is safe for asymptotic complexity. It is also close to most compilers and interpreters.
Example: const True (3*4+3*4)
given: const c _ = c
const True (3*4+3*4)
True
A later section covers function applications in which the functions are defined by pattern matching.
The evaluation rule for comparison and arithmetic of built-in numeric
types (Int, Integer, Float, Double): evaluate the operands (in operand order,
usually) until they are final answers, then evaluate the expression in question
until it is a final answer. For example: to evaluate e+f
,
evaluate e
, then evaluate f
, then evaluate
e+f
.
(For user-defined types, evaluation order and eagerness depend on
implementation, since comparison and arithmetic for them are also user-defined
via the type classes Eq, Ord, Enum, Num, etc. User-defined code can do anything
it likes. There are well-known user-defined naturals with partly lazy
comparison and fully lazy succ
, for example.)
Example: 3*4+3*4
3*4+3*4
3*4
12
12+3*4
3*4
12
12+12
24
Example: 0*(3+4)
0*(3+4)
3+4
7
0*7
0
The evaluation rule for most operations on Char (comparison, most Enum operations, conversion from/to Int) is similar: evaluate the operands (in operand order, usually) until they are final answers, then evaluate the expression in question until it is a final answer.
Example: succ 'g' == pred (pred 'j')
succ 'g' == pred (pred 'j')
succ 'g'
'h'
'h' == pred (pred 'j')
pred (pred 'j')
pred 'j'
'i'
pred 'i'
'h'
'h' == 'h'
True
if-then-else can be regarded as a special case of pattern matching. I cover if-then-else first—it's easy.
The evaluation rule for if b then et else ef
:
evaluate b
until it becomes True
or False
,
then go with et
or ef
, respectively.
Example: if 2>2 then () else ()
if 2>2 then () else ()
2>2
False
if False then () else ()
()
The evaluation rule for case e of {…}
: evaluate the argument
e
enough to find the first pattern that match (and enough to
see that it does not match those patterns before), then bind variables in
the chosen pattern, and go with the chosen branch; or to find no match,
then it's a runtime error.
There are a lot of variations in how much evaluation is enough
and how much binding and sharing happens, depending on the patterns.
In this section and several case studies shortly after, all patterns
have at most one level, e.g., there are Just x
and
x:xs
, but none nested like Left (Just x : (Nothing : ys))
.
There are also no guards.
This is why the section title says “Level 1”; we shall accomplish higher
levels later! So this section just focuses on level 1 patterns and sharing.
With level 1 patterns, the argument e
is evaluated just
until its outermost constructor is exposed, and then we know which branch
to choose.
This is right for types defined by data
,
but not quite right for those by newtype
. I will cover
newtype
in the future.
Example: case const (Just a) b of { Just _ -> True; Nothing -> False }
case const (Just a) b of { Just _ -> True; Nothing -> False }
Just _
}const (Just a) b
Just a
case Just a of { Just _ -> True; Nothing -> False }
True
A level 1 pattern could also be a Char or numeric literal.
Matching against it boils down to comparison using (==)
,
i.e., the argument is evaluated as much as (==)
specifies.
Thus, for Char and built-in numbers, evaluate until the final answer;
for user-defined numbers, it depends on user-defined (==)
.
Example: case 3*4 of { 0 -> True; _ -> False }
case 3*4 of { 0 -> True; _ -> False }
0
}3*4
12
case 12 of { 0 -> True; _ -> False }
False
There are also level 0 patterns: just a variable or the _
wildcard. The argument is unevaluated to match them.
Example: case 3*4 of { x -> True }
case 3*4 of { x -> True }
True
Variables in patterns are similar to lambda-bound variables: They are holes to be filled with pointers, and when you fill them in and go with the branch, multiple occurrences point to the same node (sharing).
Example: case Just (3*4) of { Just x -> (x,x); Nothing -> (0,0) }
case Just (3*4) of { Just x -> (x,x); Nothing -> (0,0) }
Example: case Just (3*4) of { v@(Just x) -> (x,v); Nothing -> undefined }
case
of { v@(Just x) -> (x,v); Nothing -> undefined }
In the graph above, I draw some nodes explicitly to subtly indicate this process.
We have a Just
node, and it has pointer to a 3*4
node.
We give the Just
node to case
, and that's what
v
receives. Whatever our Just
node points to,
that's what x
receives. This explains why 3*4
is shared. The following example further emphasizes that v
also shares existing nodes rather than pointing to new nodes.
Example: let j = Just (3*4) in case j of { v@(Just x) -> (x, v, j) }
The evaluation rule for function application, where the function is defined
elsewhere by pattern-matching with 1 argument: similar to a case
expression, evaluate the argument enough to find the first match, then bind
variables and go with the chosen RHS; or to find no match, then abort.
Example: null (iterate id 'c')
given these definitions:
null [] = True null (_:_) = False iterate f x = x : iterate f (f x)
null (iterate id 'c')
[]
}iterate id 'c'
'c' : iterate id 'c'
null ('c' : iterate id 'c')
False
(Technically I should draw a graph to show the sharing of 'c'
.)
The evaluation rule for function application, where the function takes several arguments and is defined by pattern-matching, is fairly delicate to describe because the order of evaluating the arguments is delicate.
Presumably, the function definition consists of a sequence of equations. At function application, the equations are tried in the sequence order.
True
, and go with its RHS.
If all evaluate to False
, jump below to try the second
equation.An important aspect is that arguments could be more evaluated than the chosen equation requires because there are equations before it, and they have more specifc patterns or more demanding guards, and we need that much evaluation to decide to skip them.
Example: f [] (r 'y')
given:
r x = x : r x f [] _ = 0 f _ [] = 1
f [] (r 'y')
0
Example: f (r 'y') []
f (r 'y') []
[]
}r 'y'
'y' : r 'y'
f ('y' : r 'y') []
1
Example: g (r 'x') (r 'y')
given:
r x = x : r x g (x:xs) (y:ys) | x==y = 0 g _ _ = 1
g (r 'x') (r 'y')
g
equation}r 'x'
'x' : r 'x'
r 'y'
'y' : r 'y'
'x' == 'y'
False
g ('x' : r 'x') ('y' : r 'y')
1
Function definitions that not only have patterns and guards but also
where
bindings can have very elaborate sharing structures.
I will show them in the future when I find a good notation.
The most important purpose of knowing lazy evaluation is to explain asymptotic complexity, especially exhaustions such as stack overflow and heap leak.
Asymptotic time complexity is the number of steps in the evaluation sequence.
Asymptotic stack complexity is the depth of nesting in the evaluation sequence (how many times I zoom into sub-expressions).
Asymptotic heap complexity can be gleaned from expression sizes in the evaluation sequence, but it may take some work when there are zoom-in steps.
When there is no zooming in, every step has the complete expression graph, and so the graph size is heap size. Of course, I inline some nodes, so count both node numbers and string lengths to be sure. Richard Bird's book has another good way to count: count actual arguments. All are asymptotically accurate counts, differing in constant overheads only.
When there is zooming in, in some cases none of the steps shows the
complete expression graph at the time. So you have to imagine a flattened
presetation for the complete graph. Here is an example
excerpted from the foldr
case study later:
f (1:2:3:[])
1 + f (2:3:[])
f (2:3:[])
2 + f (3:[])
f (3:[])
3 + f []
f []
0
At this point, the complete graph is 1 + (2 + (3 + 0))
.
The total number of actual arguments is 6 (3 binary operators).
In the foldr
case study, if you start with a list of length n,
then in the middle the complete graph has 2n actual arguments, i.e.,
Θ(n) heap space.
It is a good exercise to sum numbers (and do other tasks) with
foldr
, though we know it is fairly expensive.
foldr f z [] = z foldr f z (x:xs) = f x (foldr f z xs)
foldr (+) 0 (1:2:3:[])
1 + foldr (+) 0 (2:3:[])
foldr (+) 0 (2:3:[])
2 + foldr (+) 0 (3:[])
foldr (+) 0 (3:[])
3 + foldr (+) 0 []
foldr (+) 0 []
0
3 + 0
3
2 + 3
5
1 + 5
6
This takes Θ(n) stack space; a long list will cause a stack overflow.
Moreover, at the darkest moment, the whole graph for 1+(2+(3+0))
exists in the heap. This takes Θ(n) heap space.
Lest you think foldr
is always a hog, here is a different example.
True || _ = True False || x = x
foldr (||) False (True:False:False:[])
True || foldr (||) False (False:False:[])
True
If the binary operation is lazy in the 2nd argument,
foldr
is not a hog.
You have seen how foldr
fails at summing numbers.
Lest you think foldl
is any better:
foldl f z [] = z foldl f z (x:xs) = foldl f (f z x) xs
foldl (+) 0 (1:2:3:[])
foldl (+) (0+1) (2:3:[])
foldl (+) ((0+1)+2) (3:[])
foldl (+) (((0+1)+2)+3) []
((0+1)+2)+3
(0+1)+2
0+1
1
1+2
3
3+3
6
This uses Θ(n) stack space and Θ(n) heap space. So-called “tail call” is utterly irrelevant. After reading this example, please do me a favour: never ask about “tail call” again.
A better way to sum numbers is covered in the future, involving eagerness annotations.
Even if we have an efficient function for summing—call it
goodsum
—the next surprise comes when computing averages using
goodsum 0 xs / length xs
.
Assume that goodsum
behaves like this in general:
goodsum 5 (10:xs)
goodsum (5+10) xs
goodsum 15 xs
and these definitions for the rest:
average xs = goodsum 0 xs / length xs genlist 0 = [] genlist n = n : genlist (n-1)
I won't need a definition for length
because I won't get
that far.
average (genlist 3)
goodsum 0 (genlist 3)
genlist 3
3 : genlist (3-1)
goodsum 0 (3 : genlist (3-1))
goodsum 3 (genlist (3-1))
genlist (3-1)
2 : genlist (2-1)
goodsum 3 (2 : genlist (2-1))
goodsum 5 (genlist (2-1))
genlist (2-1)
1 : genlist (1-1)
goodsum 5 (1 : genlist (1-1))
goodsum 6 (genlist (1-1))
genlist (1-1)
[]
goodsum 6 []
6
The effect is that due to sharing, although goodsum
tries
to forget pieces of the list as it goes, some other part of the parent
expression has a pointer to the list and keeps it. Before goodsum
begins, the pointer points to just a short genlist 3
;
when goodsum
finishes, the pointer points to a long list,
which occupies Θ(n) heap space. Some people consider it a heap leak.
To show this effect more clearly, I give a flat presentation:
average (genlist 3)
genlist
}
goodsum
}
genlist
}
goodsum
}
genlist
}
goodsum
}
genlist
}
goodsum
}
Thus heap size has grown to Θ(n).
Lest you think all lazy evaluation does is taking up a lot of memory, the following examples show how to use lazy evaluation properly to obtain compositional, asymptotically optimal programs.
Again, assume that goodsum
behaves like this in general:
goodsum 5 (10:xs)
goodsum (5+10) xs
goodsum 15 xs
and this definition:
genlist 0 = [] genlist n = n : genlist (n-1)
Evaluate goodsum 0 (genlist 3)
:
goodsum 0 (genlist 3)
genlist 3
3 : genlist (3-1)
goodsum 0 (3 : genlist (3-1))
goodsum 3 (genlist (3-1))
genlist (3-1)
2 : genlist (2-1)
goodsum 3 (2 : genlist (2-1))
goodsum 5 (genlist (2-1))
genlist (2-1)
1 : genlist (1-1)
goodsum 5 (1 : genlist (1-1))
goodsum 6 (genlist (1-1))
genlist (1-1)
[]
goodsum 6 []
6
For a while, it goes like the Average example. The important point this time is that, as no one else holds a pointer to the generated list, its cells are short-lived, like virtual particles. This program takes Θ(1) memory. Some compilers even optimize away the short-lived cells.
This shows an important idiom of using lazy evaluation: We have a producer,
genlist 3
, and we have a consumer, goodsum 0
,
and when put together, they behave like a pair of co-routines, passing control
to the producer for one cell, then passing control to the consumer to use
and lose that cell, then passing control to the producer for the next cell…
We say that in Haskell we use lists as our for-loops.
The above example still does not do justice to the idiom. The idiom really shines when the consumer stops consuming early for whatever reason it feels like. The producer is not burdened with the stopping condition of the consumer. We get a cleaner separation. To be continued.
I have more Haskell Notes and Examples