Binary Lambda Calculus

<p>Binary Lambda Calculus</p>

Binary lambda calculus (BLC) is a minimal, pure functional programming language invented by John Tromp in 2004,1 based on a binary encoding of the untyped lambda calculus in De Bruijn index notation.

Background

BLC is designed to provide a very simple and elegant concrete definition of descriptional complexity (Kolmogorov complexity).

Roughly speaking, the complexity of an object is the length of its shortest description.

To make this precise, we take descriptions to be bitstrings, and identify a description method with some computational device, or machine, that transforms descriptions into objects. Objects are usually also just bitstrings, but can have additional structure as well, e.g., pairs of strings.

Originally, Turing machines, the most well known formalism for computation, were used for this purpose. But they are somewhat lacking in ease of construction and composability. Another classical computational formalism, the lambda calculus, offers distinct advantages in ease of use. The lambda calculus doesn't incorporate any notion of I/O though, so one needs to be designed.

Binary I/O

Adding a readbit primitive function to lambda calculus, as Chaitin did for LISP, would destroy its referential transparency, unless one distinguishes between an I/O action and its result, as Haskell does with its monadic I/O. But that requires a type system, an unnecessary complication.

Instead, BLC requires translating bitstrings into lambda terms, to which the machine (itself a lambda term) can be readily applied.

Bits 0 and 1 are translated into the standard lambda booleans B0 = True and B1 = False:

True = λxλy. x
False = λxλy. y

which can be seen to directly implement the if-then-else operator.

Now consider the pairing function

⟨,⟩ = λxλyλz. zxy applied to two terms M and N

M, N⟩ = λz. zMN Applying this to a boolean yields the desired component of choice.

A string s = b0b1bn−1 is represented by repeated pairing as

Bb0, ⟨Bb1…⟨Bbn − 1, z⟩…⟩⟩ which is denoted as s:z.

The z works as a list continuation, that could be a nil list (to end the string) or another string (that would be appended to the original string).

Delimited versus undelimited

Descriptional complexity actually comes in two distinct flavors, depending on whether the input is considered to be delimited.

Knowing the end of your input makes it easier to describe objects. For instance, you can just copy the whole input to output. This flavor is called plain or simple complexity.

But in a sense it is additional information. A file system for instance needs to separately store the length of files. The C language uses the null character to denote the end of a string, but this comes at the cost of not having that character available within strings.

The other flavor is called prefix complexity, named after prefix codes, where the machine needs to figure out, from the input read so far, whether it needs to read more bits. We say that the input is self-delimiting. This works better for communication channels, since one can send multiple descriptions, one after the other, and still tell them apart.

In the I/O model of BLC, the flavor is dictated by the choice of z. If we set it to the unsolvable term

Ω = (λx. xx)(λx. xx) then, to avoid getting stuck in a loop, the machine must avoid examining it, by working in a self-delimiting manner. If on the other hand we use a lambda term specifically designed to be easy to distinguish from any pairing, then the input becomes delimited. BLC chooses False for this purpose but gives it the more descriptive alternative name of Nil. Dealing with lists that may be Nil is straightforward: since

x, y⟩ M N = M x y N and

Nil M N = N one can write functions M and N to deal with the two cases, the only caveat being that N will be passed to M as its third argument.

Universality

We can find a description method U such that for any other description method D, there is a constant c (depending only on D) such that no object takes more than c extra bits to describe with method U than with method D. And BLC is designed to make these constants relatively small. In fact the constant will be the length of a binary encoding of a D-interpreter written in BLC, and U will be a lambda term that parses this encoding and runs this decoded interpreter on the rest of the input. U won't even have to know whether the description is delimited or not; it works the same either way.

Having already solved the problem of translating bitstrings into lambda calculus, we now face the opposite problem: how to encode lambda terms into bitstrings?

Lambda encoding

First we need to write our lambda terms in a particular notation using what is known as De Bruijn indices. Our encoding blc(M) of term M is then defined recursively as follows

blc(λM) = 00 blc(M)

blc(M N) = 01 blc(M) blc(N)

blc(i) = 1i0

For instance, the pairing function λxλyλz.zxy is written λλλ.132 in De Bruijn format, with blc code 00 00 00 01 01 10 1110 110.

A closed lambda term is one in which all variables are bound, i.e. without any free variables. In De Bruijn format, this means that an index i can only appear within at least i nested lambdas. The number of closed terms of size n bits is given by sequence A114852 of the On-Line Encyclopedia of Integer Sequences.

The shortest possible closed term is the identity function blc(λ1) = 0010. In delimited mode, this machine just copies its input to its output.

So what is this universal machine U? Here it is, in De Bruijn format (all indices are single digit):

(λ11)(λλλ1(λλλλ3(λ5(3(λ2(3(λλ3(λ123)))(4(λ4(λ31(21))))))

(1(2(λ12))(λ4(λ4(λ2(14)))5))))(33)2)(λ1((λ11)(λ11)))

This is in binary:

0101000110100000000101011000000000011110000101111110011110
0001011100111100000011110000101101101110011111000011111000
0101111010011101001011001110000110110000101111100001111100
0011100110111101111100111101110110000110010001101000011010
(only 232 bits (29 bytes) long)

A detailed analysis of machine U may be found in the first reference.

Complexity, concretely

In general, we can make complexity of an object conditional on several other objects that are provided as additional argument to the universal machine. Plain (or simple) complexity KS, prefix complexity KP, and an older form of prefix complexity now obsoleted by KP, are defined by

KS(x|y1, …, yk) = min {ℓ(p) | U (p : Nily1 … yk=  x    }

KP(x|y1, …, yk) = min {ℓ(p) | U (pΩ ) y1 … yk=  x     }

Kp(x|y1, …, yk) = min {ℓ(p) | U (pz  ) y1 … yk = ⟨x, z⟩}

Theorems, concretely

The identity program λ1 proves that

KS(x) ≤ ℓ(x) + 4

The program λλ1((λ11)(λλλλ2(44)(λλ32(32(2(51(21)))))))(λλ1)(λλλ1(λ4(λ4(λ132))))(λλλ1(3(λλ1))2)(λ1)2 proves that

KP(x|ℓ(x)) ≤ ℓ(x) + 188

The program

(λ11)(λλλ1(λ1(3(λλ1))(44(λ1(λλλ1(λ4(λλ52(52(31(21))))))4(λ1)))))(λλλ1(3((λ11) (λλλλ1(λ55(λλ356(λ1(λλ612)3))(λλ5(λ143)))(31))(λλ1(λλ2)2)(λ1))(λλ1))2)

proves that

KP(x) ≤ ℓ(code(x)) + 338 where code(x) is the Levenstein code for x defined by

code(0) = 0

code(n + 1) = 1 code(ℓ(n)) n in which we identify numbers and bitstrings according to lexicographic order. This code has the nice property that for all k,

ℓ(code(n)) ≤ ℓ(n) + ℓ(ℓ(n)) + ⋯ + ℓk − 1(n) + O(ℓk(n)) Furthermore, it makes lexicographic order of delimited numbers coincide with numeric order.

Number

String

Delimited

0

0

1

0

10

2

1

110 0

3

00

110 1

4

01

1110 0 00

5

10

1110 0 01

6

11

1110 0 10

7

000

1110 0 11

8

001

1110 1 000

9

010

1110 1 001

The prefix complexity of Church numerals themselves is even better:

KP(Church numeral n) ≤ ℓ(code(n)) + 139

Complexity of Sets

The complexity of a set of natural numbers is the complexity of its characteristic sequence, an infinite lambda term in the Infinitary Lambda Calculus.

The program

λ(λ1(1((λ11)(λλλ1(λλ1)((λ441((λ11)(λ2(11))))(λλλλ13(2(64)))))(λλλ4(13)))))(λλ1(λλ2)2)

whose first 100 bits of output are

0011010100010100010100010000010100000100010100010000010000010100000100010100000100010000010000000100

proves that

KS(PRIMES) ≤ 167 (a prime) while a simple variation proves

KP(PRIMES) ≤ 176

This is even shorter than Haskell's 23 byte long

`` nubBy(((>1).).gcd)[2..]``

Symmetry of information

The program

(λ1(λλ2)(λ1(λλ1))(λλλ(λ1(32)(λλ31(611(4((λ11)(λλλλλ12(λ1(λλ1)(λλλ82)(λλ1)(4(66)(λ4(λ172))))))5)(λλ1)(λλλλ1)(λλ2)(λλ1))(λλλ1(λ164)2)))(4(λ1)(λ1)(λ1)1))(λ1))(λλ(λ11)(λλλλ1(λ(λλλλ3(λ6(3(λ2(3(λ14(λ3(λ123))))(4(λ4(λ14(31)(21))))))(1(2(λ12))(λλ5(λ5(λ2(15)))76))(λ6(λ132))))(λ4(λ132)))(44)3))

proves that

KP(x, y) ≤ KP(x) + KP(y|x*) + 657 where x* is a shortest program for x.

This inequality is the easy half of an equality (up to a constant) known as Symmetry of information. Proving the converse

KP(y|x*) ≤ KP(x, y) − KP(x) + O(1) involves simulating infinitely many programs in dovetailing fashion, seeing which ones halt and output the pair of x (for which a shortest program is given) and any y, and for each such program p, requesting a new codeword for y of length ℓ(p) − KP(x). The Kraft inequality ensures that this infinite enumeration of requests can be fulfilled, and in fact codewords for y can be decoded on the fly, in tandem with the above enumeration. Details of this fundamental result by Chaitin can be found in.2

A quine

The term Q = λ1((λ11)(λλλλλ14(3(55)2)))1 concatenates two copies of its input, proving that

KS(xx) ≤ ℓ(x) + 66 Applying it to its own encoding gives a 132 bit quine:

U(blc(Q) blc(Q) : Nil) = blc(Q) blc(Q)

Compression

So far, we've seen surprisingly little in the way of actually compressing an object into a shorter description; in the last example, we were only breaking even. For ℓ(x) > 66 though, we do actually compress xx by ℓ(x) − 66 bits.

What could be the shortest program that produces a larger output? The following is a good candidate: the program (λ1111(λλ1(λλ1)2))(λλ2(21)), of size 55 bits, uses Church numerals to output exactly 2222 = 65536 ones. That beats both gzip and bzip2, compressors that need 344 and 352 bits respectively, to describe the same output (as an 8192 byte file with a single letter name).

Halting probability

The halting probability of the prefix universal machine is defined as the probability it will output any term that has a closed normal form (this includes all translated strings):

Ωλ = ∑U(p : z) ∈ NF  2 − ℓ(p)

With some effort, we can determine the first 3 bits of this particular number of wisdom:

Ωλ = .000…2 and either the 4th or 5th bit must be 1.

BLC8: byte sized I/O

While bit streams are nice in theory, they fare poorly in interfacing with the real world. The language BLC8 is a more practical variation on BLC in which programs operate on a stream of bytes, where each byte is represented as a delimited list of 8 bits in big-endian order.

BLC8 requires a more complicated universal machine:

U8 = λ1((λ11)(λ(λλλ1(λλλ2(λλλ(λ7(10(λ5(2(λλ3(λ123)))(11(λ3(λ31(21)))))3)(4(1(λ15)3)(10(λ2(λ2(16)))6)))8)(λ1(λ87(λ162))))(λ1(43)))(11))(λλ2((λ11)(λ11))))

The parser in U8 keeps track of both remaining bytes, and remaining bits in the current byte, discarding the latter when parsing is completed. Thus the size of U8, which is 355 bits as a BLC program, is 45 bytes in BLC8.

Brainfuck

The following BLC8 program

(λ11)(λ(λλλ1(λ(λ2111(λλ133(λλ1(λλ(λ7(1(3(λλλλλ10(1(λ6143))(λ15(65432)))(λλ2((λ11)(λλλ2(λλλ662(λλ6(λ1(26)3)(15(51(λ1))(5(λ1)1))))(12(λλλ312)))1(λλ2)))))(3(1(λλλλ9(1(λ51(λ154)))(24(λ142))))(5(11(λ1))(12(λ2((λ11)(λλλ1((λ11)(λλλ2(1(33))(λ8(771))))21)))))))(λ12(λ12(λ3(21))))))))(λλ1)))(11))(λ(λ11)(λλ1((λ1(1(1(λλ1(λλ2)2))))(λλ2(21))(λλ1))(22))(1(λλλλλλ1))1)

is an unbounded tape Brainfuck interpreter in 829 bits (under 104 bytes). The formatting obscures the occurrence some double digit indices: a 10, a 15, an 11 and three 12s.

This provides a nice example of the property that universal description methods differ by at most a constant in complexity. Writing a BLC8 interpreter in Brainfuck, which would provide a matching upper bound in the other direction, is left as an exercise for die-hard Brainfuck programmers.

The interpreter expects its input to consist of a Brainfuck program followed by a `]` followed by the input for the Brainfuck program. The interpreter only looks at bits 0,1,4 of each character to determine which of `,-.+<>][` it is, so any characters other than those 8 should be stripped from a Brainfuck program. Consuming more input than is available results in an error (the non-list result λx.x).

This interpreter is a rough translation of the following version written in Haskell

``````import System.Environment(getArgs)
import Control.Applicative hiding ((<|>),many)
import Text.ParserCombinators.Parsec

putc = do (     _,      _,b,      _) <- get; tell [b]
getc = do (  left,  right,_,b:input) <- get; put (  left,  right,     b,input)
prev = do (l:left,  right,b,  input) <- get; put (  left,b:right,     l,input)
next = do (  left,r:right,b,  input) <- get; put (b:left,  right,     r,input)
decr = do (  left,  right,b,  input) <- get; put (  left,  right,pred b,input)
incr = do (  left,  right,b,  input) <- get; put (  left,  right,succ b,input)
loop body = do (_,_,b,_) <- get; when (b /= '\0') (body >> loop body)
parseInstr = liftM loop (between (char '[') (char ']') parseInstrs)
<|> prev <\$ char '<'
<|> next <\$ char '>'
<|> decr <\$ char '-'
<|> incr <\$ char '+'
<|> putc <\$ char '.'
<|> getc <\$ char ','
<|> return () <\$ noneOf "]"
parseInstrs = liftM sequence_ (many parseInstr)
main = do [name] <- getArgs