Albert Y. C. Lai, trebla [at] vex [dot] net
We learn how the computer solves type inference to gain understanding (so it is not “magic”) and help explain type errors and inferred types.
A type inference algorithm uses tools that are not obvious at first sight. To help motivate them, I start with showing how an experienced or clever person does type inference heuristically, which calls for the same tools.
Infer the type of \x -> x
I don’t know what should be the type of x, but I can call it u, consider it unknown for now. Then I can deduce:
(\x -> x) :: u -> u
I will use names starting with u for unknowns, e.g., u1, u2. Note: They are not necessarily type variables for polymorphism—to be determined later. (In contrast, I will write polymorphic types like “∀a. a → a”.)
Infer the type of
\x y -> if False then (x, 5) else (True, y)
Like before, choose unknown u1 for x, u2 for y. Then I would really like to zoom into the function body
if False then (x, 5) else (True, y)
but with the knowledge x::u1, y::u2 at the back of my mind.
This suggests that a person or an algorithm keeps track of a “type environment” consisting of the variables in scope and their types for easy lookup.
It also helps with local variables from “let”. Infer the type of
let b=False in \x y -> if b then (x, 5) else (True, y)
After finishing b=False
deducing b::Bool, I zoom into
the body
\x y -> if b then (x, 5) else (True, y)
with b::Bool in the type environment.
Assume 5::Integer, True::Bool
Infer the type of
\x y -> if False then (x, 5) else (True, y)
After choosing unknowns and zooming in etc., eventually it comes down to:
(x,5) :: (u1,Integer)
(True,y) :: (Bool,u2)
if-then-else wants them to be the same type, so:
(u1,Integer) = (Bool,u2)
Solving that equation gives u1 = Bool, u2 = Integer.
Answer: Bool -> Integer -> (Bool, Integer)
\x -> if False then (x, 5) else (True, x)
Similar thought process but the equation is
(u1,Integer) = (Bool,u1)
This means Integer = Bool. This is a type error, the “type mismatch” kind.
The other kind of type errors goes like this:
\x -> if False then x else [x, x]
I use unknown u1 for x. The equation to solve goes like
u1 = [u1]
Now, theoretically there is nothing wrong with that; the feature name is “infinite types”, and a few (very few) academic toy languages have it.
But people find that it is rarely useful, it is hard to follow even
when it is useful, and if a piece of practical code seems to do it, more
than 99.9% of the time it’s a typo rather than an intention, e.g., the
author meant “then [x]
”. So all practical languages and
almost all academic languages ban it. (Statically typed languages, that
is.)
In general, if an equation to solve goes like
u1 = … something that mentions u1 …
and is not simply “u1 = u1”, we say that it is a type error, and give the reason “infinite type”. You sometimes also see the wording “occurs check”; it means that the algorithm checks whether u1 occurs on the right side, and finds it.
In the rare case that you find an infinite type useful, the recommendation is to define an ordinary recursive wrapper type:
newtype T = MkT [T]
-- so MkT :: [T] -> T
unT :: T -> [T]
unT (MkT list) = list
You now have to keep wrapping and unwrapping, but the logic is clearer because the code is explicit about when to think of one single T, when to think of a list of T’s.
I need to create new unknowns all the time.
I will run into equations like (u1,Integer) = (Bool,u2), and I need to solve them and find out u1 = Bool, u2 = Integer. This solving is called “unification”; I was unifying (u1,Integer) with (Bool,u2).
Because of that, I need a mutable table of solved unknowns. Every time I solve an equation, it causes an update. Since each entry is telling me to replace an unknown by its solution, this table is a table of substitutions.
I need the type inference algorithm to take one more parameter: a type environment to tell me which variables are in scope and what their types are. It is also a good place to store the types of library functions.
Unification solves two problems in one go: Checking whether two types match up, and if they contain unsolved unknowns, solve them, i.e., find out whether and what can be substituted for the unsolved unknowns to make the two types match up.
It doesn’t return any value. It gives its answer by how it terminates: If the two types match up (perhaps after adding substitutions to the table), return normally; if not, throw an error.
A mutable table of substitutions is maintained, initially empty. Every call to the unification algorithm can update the table. Notation for substitutions: “u := type”.
For clarity, I will make sure that solved unknowns don’t appear in the RHS, e.g., I won’t let this table happen:
u1 := [u2] (I won’t write this, I’ll write u1 := [Integer])
u2 := Integer
Remark: Real implementations allow this for efficiency and play clever tricks elsewhere to compensate. But I am going for least confusion for beginners.
Notation for applying all substitutions in the current table to rewrite a type: “apply-subst type”. This returns the rewritten type. I omit coding it up. Example: If the current table is
u3 := Integer
u2 := u1 → u1
then apply-subst (u2 → u4 → Bool) returns ((u1→u1) → u4 → Bool).
The type inference algorithm is not always aware that an unknown is already solved and has an entry in the substitution table. It can call the unification algorithm with solved unknowns. For clarity, I first use apply-subst to get rid of them:
Then the algorithm for unify-intern can assume that unknowns in inputs are unsolved.
Remark: Real implementations don’t do this, they play clever tricks elsewhere to compensate.
unify-intern is just structural recursion on the syntax of type expressions:
In this example, I start with a non-empty substitution table: It already has an entry “u1 := u2 → u3” (imagine that it was caused by a previous call to unify). We are now requested to unify u1 with u3→u4.
Current table:
u1 := u2 → u3 | |
unify u1 (u3→u4)
unify-intern (u2→u3) (u3→u4) unify-intern u2 u3 add u3:=u2 replace u3 in the rest of the table | |
Updated table:
u1 := u2 → u2 u3 := u2 | |
unify-intern (apply-subst u3)
(apply-subst u4)
unify-intern u2 u4 add u4:=u2 replace u4 in the rest of the table | |
Updated table:
u1 := u2 → u2 u3 := u2 u4 := u2 |
Later on, in example steps of the type inference algorithm, when it comes to unification, I will just write the call and the updated table (usually easy to see), like this:
Current table:
u1 := u2 → u3 | |
unify u1 (u3→u4)
unify-intern (u2→u3) (u3→u4) | |
Updated table:
u1 := u2 → u2 u3 := u2 u4 := u2 |
Notation:
“infer env term”: Procedure call to infer the
type of term; look up variables and types in env,
the type environment. The answer is meant to be monomorphic, i.e., if
there are unsolved unknowns, let them be, don’t generalize to a
polymorphic type. Example: If term is
\x -> x
, return u1→u1, leaving u1 as a new unsolved
unknown.
Again, for least confusion for beginners, we also use apply-subst on all answers before returning them.
Notation for environments: For example, an environment that says “the type of x is u3, the type of y is Bool” is written “{x::u3, y::Bool}”. The empty environment is written “{}”. When I clone an environment and add something, I write like “env plus z::u2” to add z of type u2.
The type inference algorithm is a structural recursion over the syntax of term expressions. Here are the cases:
Literals have basically hardcoded types, so just return them. For simplicity, I ignore type classes, so for example I just pretend that 4 is always Integer.
Similarly for most other literals.
Some literals are data constructors of parametrized algebraic data types, e.g., Nothing, the empty list []. I postpone them to the section on algebraic data types later.
The type of a variable is simply given in the environment, or else it’s a “variable not in scope” error—the 3rd and final source of errors in type inference. (If you care about certain library functions, be sure to include them in the initial environment!)
If the type is polymorphic, there is some follow-up work to do. But
first why it is possible: 1. many library functions have polymorphic
types; 2. the let
construct (shown later) can put
polymorphic types in the environment.
The follow-up work is to instantiate the type variables with new unknowns. (Let the context worry about solving them.)
Example of finding a polymorphic type and instantiating it:
For a lambda like \v → expr, we can create an unknown, say u, for the type of v, and wait for it to be solved later. Then we can zoom into expr, but keep in mind the additional v::u; in other words, a recursive call but give it a larger environment that has the additional v::u.
If the recursive call figures out that expr has type T, that means the whole lambda has the function type u→T.
Remark: Although u starts as a new unknown, after the recursive call for expr, u may have been solved, so u→T may need a rewrite, even though T doesn’t.
Example: \x → x
For lambdas of two or more parameters, you could treat as nested lambdas, e.g., \x → (\y → expr). If you find that tedious, we can shortcut it:
Example:
For a function application like f e, we can first use recursion to find the types of f and e respectively.
f is being used as a function, so we also need to check whether its type is a function type. Its argument is e, so we also need to check whether the domain type matches the type of e. These checks are what unification is for! We don’t have any requirement on the codomain type at this point, so just create a new unknown for it.
The type of the whole f e is the codomain type, i.e., whatever the new unknown turns out to be.
A small example: The library has id :: ∀a. a→a. Let’s use it like “id True”.
infer {id :: ∀a. a→a} (id True)
Tf := infer {id :: ∀a. a→a} id T := look up id, it’s ∀a. a→a it’s polymorphic create new unknown u1 for a T’ := instantiate ∀a. a→a T’ = u1→u1 return (apply-subst (u1→u1)) = u1→u1 Tf = u1→u1 Te := infer {id :: ∀a. a→a} True return Bool Te = Bool create new unknown u2 unify Tf (Te → u2) unify-intern (u1 → u1) (Bool → u2) | |
Updated table:
u1 := Bool u2 := Bool | |
return (apply-subst u2) = Bool
|
A larger example: \f x → f (f x)
infer {} (\f x → f (f x)):
create new unknowns u1 (for f), u2 (for x) T1 := infer {f::u1, x::u2} (f (f x)): Tf := infer {f::u1, x::u2} f return u1 Tf = u1 T2 := infer {f::u1, x::u2} (f x) Tf2 := infer {f::u1, x::u2} f return u1 Tf2 = u1 Tx := infer {f::u1, x::u2} x return u2 Tx = u2 create new unknown u3 unify Tf2 (Tx → u3) unify-intern u1 (u2 → u3) | |
Updated table:
u1 := u2 → u3 | |
return (apply-subst u3) = u3
T2 = u3 create new unknown u4 unify Tf (T2 → u4) unify-intern (u2 → u3) (u3 → u4) | |
Updated table:
u1 := u2 → u2 u3 := u2 u4 := u2 | |
return (apply-subst u4) = u2
T1 = u2 return (apply-subst (u1→u2→T1)) = (u2→u2)→u2→u2 |
For multiple parameters, e.g., f e1 e2, again you can either treat as two function applications, (f e1) e2, or do the obvious shortcut.
if-then-else requires:
An algebraic data type definition creates two aspects:
Accordingly, it means adding two groups of inference rules. I will illustrate by an example type. Then you extrapolate to other types.
data MI = Non | Has Integer
Making:
Using: For simplicity I just cover case-expressions with simple patterns:
case eMI of { Non -> e0 ; Has v -> e1 }
This is like if-then-else on steroid. We require:
So here we go:
Parameterized algebraic data types are similar, but there are additional unknowns to create for instantiating the type variables. Example:
data Maybe a = Nothing | Just a
The data constructors are polymorphic—you can think of them as:
Nothing :: ∀a. Maybe a
Just :: ∀a. a -> Maybe a
Making:
Using:
Extrapolate this to standard parameterized types such as lists and tuples.
The 2-tuple type can be understood as
data Tuple2 a b = MkTuple2 a b
except for the syntactic difference between
(e1, e2) :: (T1, T2)
and
MkTuple2 e1 e2 :: Tuple2 T1 T2
.
Likewise, with a syntactic change, the list type can be understood as
data List a = Nil | Cons a (List a)
People expect these to be legal:
let f = \x -> x -- expect f :: ∀a. a -> a
in (f True, f 4)
let e = [] -- expect e :: ∀a. [a]
in (True : e, 4 : e)
This means when “let” is used for local definitions (and similarly
for “where”), we need to generalize an inferred type to a polymorphic
type, e.g., \x -> x
is normally just u→u with an unknown
u, but here it is generalized to ∀a. a → a so f is polymorphic.
Example of having f = \x -> x
and using it two
ways:
infer {} (let f = \x → x in (f True, f 4))
T := infer {} (\x → x) … T = u → u Tg := generalize {} (u → u) env-updated := {} create new type variable a for u in (u → u) return (∀a. (u → u but replace u by a)) = ∀a. a → a Tg = ∀a. a → a infer {f :: ∀a. a → a} (f True, f 4) | |
T1 := infer {f :: ∀a. a → a} (f
True)
Tf := infer {f :: ∀a. a → a} f (∀a. a → a needs instantiation) create new unknown u2 return (u2 → u2) Tf = (u2 → u2) Te := infer {f :: ∀a. a → a} True return Bool Te = Bool create new unknown u3 unify Tf (Te → u3) unify-intern (u2 → u2) (Bool → u3) | |
Updated table:
u2 := Bool u3 := Bool | |
return (apply-subst u3) = Bool
T1 = Bool T2 := infer {f :: ∀a. a → a} (f 4) Tf := infer {f :: ∀a. a → a} f create new unknown u4 return (u4 → u4) Tf = (u4 → u4) Te := infer {f :: ∀a. a → a} 4 return Integer Te = Integer create new unknown u5 unify Tf (Te → u5) unify-intern (u4 → u4) (Integer → u5) | |
Updated table:
u2 := Bool u3 := Bool u4 := Integer u5 := Integer | |
return (apply-subst u5) =
Integer
T2 = Integer return (apply-subst (T1, T2)) = (Bool, Integer) |
Example of why we don’t generalize unknowns that appear in env-updated:
infer {} ((\y → let v = y in 3 + v) False)
We expect a type error because 3 + False should be wrong. Go through some important events in type inference to see how it should unfold:
If we generalized to v :: ∀a. a, then there would be no error:
But we don’t want that! There should be an error somewhere. That can only happen by keeping x :: uy as is:
The difference from the f = \x → x example:
It is safe to generalize independent unknowns, but unsafe to generalize unknowns that are still used in the environment.
If v = rhs is self-recursive, we need v to be in the environment when inferring rhs. We don’t yet know its type, so just create a new unknown! But later be sure to unify the new unknown with the inferred type of rhs.
Example: f = \x → x : f x
, which we expect to be ∀a. a →
[a].
infer {} (let f = \x → x : f x in …)
create new unknown uf for f Tf := infer {f :: uf} (\x → x : f x) create new unknown ux for x Tc := infer {x :: ux, f :: uf} (x : f x) Tx := infer {x :: ux, f :: uf} x return ux Tfx := infer {x :: ux, f :: uf} (f x) Tf := infer {x :: ux, f :: uf} f return uf Tx2 := infer {x :: ux, f :: uf} x return ux Tx2 = ux create new unknown ufx unify Tf (Tx2 → ufx) unify-intern uf (ux → ufx) | |
Updated table:
uf := ux → ufx | |
return (apply-subst ufx) = ufx
Tfx = ufx unify [Tx] Tfx unify-intern [ux] ufx | |
Updated table:
uf := ux → [ux] ufx = [ux] | |
return (apply-subset [Tx]) = [ux]
Tc = [ux] return (apply-subst (ux → Tc)) = ux → [ux] Tf = ux → [ux] unify uf Tf unify-intern (ux → [ux]) (ux → [ux]) Tfg := generalize {} (ux → [ux]) (See note below for why exclude u::uf in env here.) env-updated := {} create new type variable a for ux return (∀a. a → [a]) Tfg = ∀a. a → [a] infer {f :: ∀a. a → [a]} … |
By way of example, suppose we have
let f = use g, x
g = use h, y
h = use g, z
x = use y
y = use z
z = use x
in body
equivalently, the transposed call graph (e.g., f calls g, draw an edge from g to f), and identify strongly connected components (dotted circles):
It stands to reason that we should do:
First infer x, y, z together.
Then use that to infer g, h together.
Then use the above to infer f.
Finally can infer body.
infer {f :: Tfg, g :: Tgg, h :: Thg, x :: Txg, y :: Tyg, z :: Tzg} body
In general, identify strongly connected components in the transposed call graph, put them in topological sort order, and infer each component.
Programmer-provided type signatures. Type classes.
I have more Haskell Notes and Examples