-
Notifications
You must be signed in to change notification settings - Fork 2
Description
This is a summary of typechecker concepts and invariants, and how to make use of them to get a more efficient kernel.
Firstly, the typechecker uses normalization by evaluation with a lazy semantics. This means we have both expressions and values, which stand for normalized expressions (expressions in weak-head normal form). Evaluation works by taking an expression and an environment and producing a value. Since this is a lazy evaluator, the environment takes thunks as arguments, which you can think of as suspended evaluations: expression/environment pairs (yes, the definition is mutually recursive). Here's our first invariant:
- We can only ever evaluate expressions that were previously type checked. No dynamic type errors should occur when evaluating. This, in particular, means that we should not worry about impossible cases like having a constructor with too many arguments, or a projection that's out of bounds
The second thing which the current kernel fails to realize is that values are already in whnf. But there's a caveat here. Our approach has incorporated thunks into values for a good reason. First, adding a new variant is cheap since the tags are big (it won't increase layout size). Second, creating a thunk for every expression is expensive: primitives and constructors with no arguments are already in whnf, lambdas and pi types could be immediately converted to functions/pi type values by capturing the expressions directly, and these could be added to the environment directly. So here's what we're going to assume:
- Values are either thunks or expressions in whnf. Environments carry potential thunks so whenever we want to extract a value we must force it. However, as we'll see infer/check won't ever work with thunks as types (they'll always be forced values)
Now onto the typechecker. Here's what we have to keep in mind: expressions are typed by values. These are the (approximate) signatures of its two main functions:
infer : Ctx -> Expr -> Expect Value Error
check : Ctx -> Expr -> Value -> Expect () ErrorThe context Ctx is a list of types (i.e. a list of values). Inference works by building the type as we traverse the expression, the checker works by simply verifying the expressions against the type. Since our expressions are fully annotated, inference always works. In a way, you could view check as inference plus equality, but we should keep in mind that simply checking is often more efficient than doing inference + equality. There are a few places in the current kernel where a simple check would be more efficient. In fact this is true:
-
The only place we should ever check for equality is in the default branch of
check, where we check whether two types (two forced values) are equal. This leads us to: -
We can safely assume that the two values we want to check for equality always have the same type. Therefore, whenever we need to check the type of a value, we can infer the same type for the second value
The Lean typechecker has two features regarding equality, that of unit-like types and proof irrelevance. The current Lean implementation does a type inference of values to see whether its type is unit-like or whether it's a prop (in which case the values are proofs). This is very wasteful. Imagine trying to show that a list of a million elements is equal: at each node of the list we'd have to infer its type. Although unit-like types are easy to check without inference (as they're either a nullary constructor or a free variable), checking whether a value is a proof is much harder. One possibility is for equality to take as an argument the type of the values, but that would require it to construct the type of the children nodes. This is something we have to think about
Lastly, regarding lookups. The current implementation of the list lookup is something like this
fn g_list_lookup(xs: GList, idx: G) -> G {
match xs {
GList.Cons(&x, &rest) =>
match idx {
0 => x,
_ => g_list_lookup(rest, idx - 1),
},
}
}But this doesn't have a very good memoization property. Here's an example trace
g_list_lookup([1, 3, 4, 2, 1], 3) => 2
g_list_lookup([3, 4, 2, 1], 2) => 2
g_list_lookup([4, 2, 1], 1) => 2
g_list_lookup([2, 1], 0) => 2You can see that if you access the fourth element, none of the previous elements are memoized. By contrast, a function like
fn g_list_drop(xs: GList, idx: G) -> GList {
match idx {
0 => xs,
_ =>
match g_list_drop(xs, idx - 1) {
GList.Cons(_, &rest) => rest,
}
}
}will have a trace as such as
g_list_drop([1, 3, 4, 2, 1], 3) => [2, 1]
g_list_drop([1, 3, 4, 2, 1], 2) => [4, 2, 1]
g_list_drop([1, 3, 4, 2, 1], 1) => [3, 4, 2, 1]
g_list_drop([1, 3, 4, 2, 1], 0) => [1, 3, 4, 2, 1]Which means that if you lookup the fourth element, then all the previous elements will also be memoized. This form of lookup might be potentially beneficial to the evaluator and typechecker, as it could allow for amortized O(1) access