Universality and quines
Quines, named by Douglas Hofstadter in honor of Willard Van Orman Quine, are self-replicating programs, programs that output their own source code. It’s actually just a consequence of Turing completeness of a programming language that it’s possible to construct a quine. For example, here’s a quine in a sort of classic structure in Python:[1]
a = ['print "a =", a', 'for s in a: print s'] print "a =", a for s in a: print s
In this note, I’m going to show how to make a lambda calculus interpreter in lambda calculus (a “universal machine”), how to use it to prove a kind of fixed-point theorem for the lambda calculus, and then how to get a quine from it.
1. The lambda calculus
Alonzo Church came up with a fairly nice and simple calculus that can be used as a basis for both computation and mathematics. In his version, there are three types of symbolic expressions:
- if is a symbol (variable), where we presume we have an infinite set of symbols to draw from.
- if is a symbol and is a lambda calculus term (abstraction).[2]
- if and are lambda calculus terms (application).
- ,
- , and
- .
- where
- .
- -renaming:
- If is a lambda term and , .
- -reduction:
- If is a lambda term, then .
- -conversion:
- If , then .
There’s not really a sense in which a lambda calculus term is evaluated per se, but there are various normal forms one could try to put a particular lambda calculus term in (as functional programming languages do). But, reduction strategies have divergent terms, which are lambda calculus terms that do not stabilize while applying the reduction strategy. For example, is generally divergent.
As a shorthand, we will take application to be left-associative, meaning is , and abstraction to be right-associative, meaning is . One usually thinks of as being a two-argument function.[3] We will also sometimes not write outermost parentheses, and we will use the rule that means .
1.1. de Bruijn indexes
Symbolic variables are an annoying complexity for lambda calculus because of all the issues with free variables in -renaming and substitution. One way around this is to use the natural numbers for variables, where the number represents how many abstractions out that variable is bound from.
Lambda calculus terms when using de Bruijn indexes are
- if (variable),
- if is a term (abstraction), and
- if and are terms (application).
Then, the equivalence relation no longer needs -renaming. Substitution is
- .
The equivalence relation is generated by two rules.
- -reduction:
- .
- -conversion:
- .
For lambda calculus terms with no free variables, it is a purely mechanical process going back and forth between de Bruijn indexes. One way to start is to -rename all the bound variables so that they come from some alphabet so that each nested abstraction uses the next variable in the list, and then the index is where is from and is the abstraction depth. Terms with free variables can be dealt with by imagining a that those variables have been bound in some specified order.
I sort of like the following variation of de Bruijn indexes, where variables are written in unary rather than from the infinite alphabet .
- is a term (representing zero in ).
- is a term if is a term (the successor, representing in but extended to all terms).
- is a term if is a term.
- is a term if and are terms.
- (Errata: [4])
- .
2. Algebraic data types
Pretty much every data type that you want can be represented as some expression of products and disjoint unions of basic data types, possibly with some recursion. For example, if represents a singleton type, the the type for a linked list of things of type is recursively defined by . These sorts of algebraic expressions to represent types are ADTs (algebraic data types).
In lambda calculus, a value of a product type can be represented by the expression , where the projector functions are and . The constructor function is . The idea is that a value is something that provides its component values to some supplied function.
A disjoint union , on the other hand, is described by its inclusion functions: and , which is a sort of dual to the projectors for the product type. The idea here is that a value is something that selects a function to handle which of the types the value is from.
A famous disjoint union is the two-valued boolean algebra, of two singleton types. In an optimized representation that doesn’t use the values from the singleton types,
so if is a boolean, an if statement looks like .[5]In the case of products and disjoint unions of more than two types, there is an obvious optimization that one can apply. For example, for we can use as a constructor.
2.1. Representing lambda calculus terms
The point of introducing ADTs is that we can use them to represent lambda calculus terms within the lambda calculus itself. While it is true that technically a lambda calculus term is a string, we will be representing it in a tree form. In particular, we will represent the variation on lambda calculus terms from the end of the de Bruijn indexes section.
Let denote the type for a lambda calculus term, for which we give the following recursive ADT description
where the four terms correspond to , successor, abstraction, and application. In particular, we’ll use these constructors:As an example, the expression for can be expressed in this type as .
If we have an expression from , we can decompose it and convert it into a lambda calculus term. Write for the corresponding term, the being for “interpretation.” This is a function outside the lambda calculus, but we will show the lambda calculus is powerful enough to implement it.
3. An interpreter
In this section we’ll derive a term such that when is an expression representing a lambda calculus term from the type , then . This interpreter is the lambda calculus equivalent of a universal machine, because it can take descriptions of lambda terms and convert them into actual lambda terms, compared to descriptions of Turing machines being run as actual Turing machines.
At a high level, the term looks like the following, in a pseudo-Haskell notation. We make the decision that free variables in evaluate to a diverging term. The env variable is a linked list of lambda calculus terms (actual terms, not representations of them).
u p := eval p empty eval p env := case p of zero -> head env succ(e) -> eval e (tail env) abs(m) -> (x |-> eval m (push x env)) app(m1, m2) -> (eval m1 env) (eval m2 env) empty := nil |-> cons |-> nil push x list := nil |-> cons |-> cons x list head list := list fail (x |-> sublist |-> x) tail list := list empty (x |-> sublist |-> sublist) fail := failThe way this works is that the eval function keeps track of an environment of all the substitutions from -reduction. The interpretation of is as an actual lambda abstraction that pushes the value of the bound x to the zeroth entry of the environment list. Furthermore, empty is the empty linked list, push is the constructor, and head and tail are convenience destructors, where head fails in the case the environment is an empty list.
I’m afraid I’m going to be lazy and not explicitly prove that , but it is a matter of checking the equivalence relations (and the substitution rule). Note that we do not care about evaluation strategies because the point is just that is a lambda term in the right equivalence class.
Now, how do we deal with recursion when the lambda calculus does not seem to provide it? The combinator is a way to let a term refer to itself, and its defining relation is that . Thus, if a function needs to refer to itself, like in the factorial function
factorial n := if n <= 1 then 1 else n * factorial (n - 1)we can transform it like so
factorial' fact n := if n <= 1 then 1 else n * fact (n - 1) factorial := Y factorial'where none of the definitions are self-referential. The trick is that
factorial n = Y factorial' n = factorial' (Y factorial') n = factorial' factorial nwhere the equality signs actually mean equivalence, and hence factorial’ is able to refer to something that is equivalent to the factorial expression itself. This combinator can actually be given as a lambda calculus term: The defining relation is not too difficult to check.
Let’s look at the fail function, which we could define as
fail := Y fail' fail' f := fhence fail is , the divergent term from before.
Let’s rewrite the definition for u to be non-self-referential, while also substituting in some of the definitions for linked lists.
u p := (Y eval') p (n |-> c |-> n) eval' eval p env := case p of zero -> env omega (x |-> t |-> x) succ(e) -> eval e (env env (x |-> t |-> t)) abs(m) -> (x |-> eval m (n |-> c |-> c x env)) app(m1, m2) -> (eval m1 env) (eval m2 env) omega := (x |-> x x) (x |-> x x) Y f = (x |-> f (x x)) (x |-> f (x x))For succ(e), there is a trick that if env is empty then by passing env as the first argument the expression is equivalent to the empty list in that case.
Now, to write this in an impenetrable way as a single lambda term:
u := expr |-> ((f |-> (x |-> f (x x)) (x |-> f (x x))) (eval |-> p |-> env |-> p (env ((x |-> x x) (x |-> x x)) (x |-> t |-> x)) (e |-> eval e (env env (x |-> t |-> t))) (m |-> x |-> eval m (n |-> c |-> c x env)) (m1 |-> m2 |-> (eval m1 env) (eval m2 env))) ) expr (n |-> c |-> n)
Or, if that was too clear, using de Bruijn indexes:
The point of this last exercise was just to make it obvious that can be represented as an expression . If everything worked out, then in fact .If we do not care about the behavior for free variables, we could shorten this term by replacing with something random, like for instance, which would have the effect of making free variables evaluate to the empty list.
It’s also possible to write a universal interpreter that enforces a particular reduction strategy, for example the Krivine machine[6], with a short implementation by Scott Kovach.
4. A fixed-point theorem
Consider a lambda term with the property that for every lambda expression from , is equivalent to something in . That is, can be thought of as an actual function . We will show that then has a fixed point with respect to , namely that there is some expression from such that . (The argument is adapted from the one presented here.[7])
A key operator in the construction of is which takes a lambda expression and produces a lambda expression that when evaluated is a term that is equivalent to that expression. It’s much less complicated to say in symbols: . We can (tediously) define this operator inductively.[8]
Let be defined by , which creates an expression that represents supplying the representation of a lambda term to itself as an argument.
Let be the expression for , and let .
Hence, is indeed a fixed point of with respect to .5. Quines
A quine is an expression in such that .
Quines are the result of applying this fixed-point theorem to the operator. By that theorem, there is some in such that , but since , we have obtained a quine.
What does this quine look like? We had that was the expression for , but this simplifies to , so we may let , where is the lambda expression for and is a lambda expression for . Then, is , which expanded a bit is
It doesn’t seem like it would be too illuminating to expand this out anymore, but it is interesting that was eliminated so easily, suggesting the fixed-point theorem was tremendously overpowered for the purpose of deriving a quine.Anyway, notice that the way works is that there is a code part and a data part, and that is sort of like a print function. There’s sort of a parallel between the structure of and the Python program from the beginning.
6. In set theory
As I understand it, one of the ways to study ZFC set theory is by model theory, where a model is a set along with a relation that satisfies all of the axioms. Propositions of first-order logic can themselves be represented as sets (like how lambda terms can be represented in as lambda calculus terms), and there is a truth term that evaluates the truth of a particular proposition against a particular model.
Unfortunately there is no way to prove in ZFC that there is a model of ZFC (that’s Gödel’s second incompleteness theorem).
As an application of the idea in the fixed-point theorem, let’s show that there’s no absolute notion of Truth in ZFC. If you want, that there’s no truth term that evaluates whether or not is a true proposition in the current model.
The idea is that we can use proposition-level functions defined on set representations of propositions. If is a proposition-level function on the set of propositions (i.e., , where we will write for this and leave it up to the reader to transform the first-order statement appropriately), then we can show that there is some statement such that . Let be the set of set representations of propositions with one free variable (predicates). We can define proposition-level function that feeds a predicate the representation of the predicate. Then, we can define a predicate and thus a proposition , the required fixed point.
But, what if is the function that negates a proposition? Then we have a proposition such that , so can’t possibly be a truth term in actuality. My intuition for what is going wrong is that seems like it would need to have an environment to keep track of all of the bound variables, but ZFC prevents you from having sets with arbitrary elements: you always have to specify a domain. Well, the domain would be the model, but you don’t have a way to gain direct access to the current model from within the model. If you did, you’d have Russell’s paradox.
A kind of fun modification to this is to say “well, ok, fine, there’s no predicate, but we can still have have a universal truth predicate that checks the truth across all models, since you can write down a predicate to check whether a set is a model of set theory, so what happens now?”[9] Here’s what happens: the kind of argument above gives us a proposition such that . Gödel’s completeness theorem says that is equivalent to being provable, and so being provable is equivalent to being provable. So, if the theory is consistent, neither nor are provable (the law of the excluded middle does not apply to provability). The theory is incomplete!