```
┬─┬─────────┬─┬─┬ ┬─┬──
└─┤ ──┬──── │ │ │ ┼─┼─┬
│ ──┼─┬── │ │ │ │ ├─┘
│ ┬─┼─┼─┬ │ │ │ ├─┘
│ └─┤ │ │ │ │ │ │
│ └─┤ │ │ │ │ │
│ ├─┘ │ │ │ │
└─────┤ │ │ │ │
└───┤ │ │ │
└─┤ │ │
└─┤ │
└─┘
```

Most people believe 2^{64}-1 = 18446744073709551615, or 0xFFFFFFFFFFFFFFFF in hexadecimal,
to be the largest number representable in 64 bits. In English, it’s quite the mouthful:
eighteen quintillion four hundred forty-six quadrillion seven hundred forty-four
trillion seventy-three billion seven hundred nine million five hundred fifty-one
thousand six hundred fifteen.

That is indeed the maximum possible value of 64 bit unsigned integers,
available as datatype uint64_t in C or u64 in Rust.
We can easily surpass this with floating point numbers. The 64-bit
double floating point format
has a largest (finite) representable value of
2^{1024}(1-2^{-53}) ~ 1.8*10^{308}.

What if we allow representations beyond plain datatypes? Such as a program small enough to fit in 64 bits. For most programming languages, there is very little you can do in a mere 8 bytes. In C that only leaves you with the nothingness of “main(){}”.

But there are plenty languages that require no such scaffolding. For instance, on Linux there is arbitrary precision calculator “bc”. It happily computes the 954242 digit number 9^999999 = 35908462…48888889, which can thus be said to be representable in 64 bits. Had it supported the symbol ! for computing factorials, then 9!!!!!!! would make a much larger number representable in 64 bits.

Allowing such primitives feels a bit like cheating though. Would we allow a language that has the Ackerman function predefined, which sports the 8 byte expression ack(9,9) representing a truly huge number?

## No primitives needed

As it turns out, the question is moot. There are simple languages with no built in primitives. Not even basic arithmetic. Not even numbers themselves. Languages in which all those must be defined from scratch. One such language allows us to blow way past ack(9,9) in under 64 bits.

But let’s first look at another such language, one that has been particularly well studied for producing largest possible outputs. That is the language of Turing machines.

## Busy Beaver

The famous Busy Beaver
function, introduced by
Tibor Radó in 1962, which we’ll
denote BB_{TM}(n), is defined as the maximum number of 1s that can be written with
an n state Turing machine starting from an all 0 tape before halting. Note that if we
consider this output as a number M written in binary, then it only gets
credited for its length, which is log_{2}(M+1).

In 64 bits, one can fully specify a 6 state binary Turing machine, or TM for short. For each of its internal states and each of its 2 tape symbols, one can specify what new tape symbol it should write in the currently scanned tape cell, whether to move the tape head left or right, and what new internal state, or special halt state, to transition to. That takes 6*2*(2+⌈log2(6+1)⌉) = 60 bits. Just how big an output can a 6 state TM produce?

The best known result for 6 states is
BB_{TM}(6) > 10↑↑15,
which denotes an exponential tower of fifteen 10s. Clearly, in this notation there’s not that
much difference between a number and its size in bits.
Large as this number is, it’s still pathetically small compared to even
ack(5,5), which no known TM of less than 10 states—amounting to 110 bits of
description—can surpass.

For that, we need to move beyond Turing machines, into the language of

## Lambda Calculus

Alonzo Church conceived the λ-calculus in about 1928 as a formal logic system for expressing computation based on function abstraction and application using variable binding and substitution.

A tiny 63 bit program in this language represents a number unfathomably larger than not only ack(9,9), but the far larger Graham’s Number as well. It originates in a Code Golf challenge asking for the “Shortest terminating program whose output size exceeds Graham’s number”, answered by user Patcail and further optimized by user 2014MELO03. With one final optimization applied, the following 63 bit program

```
01 00 01 01 01 01 01 10 10 00 00 00 01 01 01 10 1110 110 10 10 10 10 00 00 01 110 01 110 10
```

is the Binary Lambda Calculus encoding of the term

```
(λ 1 1 (λ λ λ 1 3 2 1) 1 1 1) (λ λ 2 (2 1))
```

where λ (lambda) denotes an anonymous function, and number i is the variable bound by the i-th nested λ. This is known as De Bruijn notation, a way to avoid naming variables. A more conventional notation using variable names would be

```
(λt. t t (λh λf λn. n h f n) t t t) (λf λx. f (f x))
```

The top of this post shows a graphical representation of the term. The last 16 bits of the program—making up more than a quarter of its size—encodes the term λf λx. f (f x), which takes arguments f and x in turn, and iterates f twice on x. In general, the function that iterates a given function n times on a given argument is called Church numeral n, and is the standard way of representing numbers in the λ-calculus. The program, which we’ll name after its underlying growth rate, can be expressed more legibly as

```
wCubed = let { 2 = λf λx. f (f x); H = λh λf λn. n h f n } in 2 2 H 2 2 2
```

The next section is mostly for the benefit of readers familiar with ordinal arithmetic, and is probably better skipped by others.

## Proof of exceeding Graham’s Number

Following the great suggestion of Googology user “BMS is not well-founded”, let
us start by defining a wCubed-customized
Fast-growing hierarchy, a family that
assigns, to each ordinal α, a function [α] (diverting from the usual f_{α}
notation for improved legibility) from natural numbers to natural numbers.
We’ll treat all numbers as Church Numerals, so we can write n f instead of the
usual f^{n} and write f n instead of f(n) as normally done in λ-calculus.

### Definitions:

- H h f n = n h f n
- H2 = H 2
- [0] n = 2 n = n
^{2} - [α+1] n = n 2 [α] n = H 2 [α] n
- [ωα+ω] n = [ωα+n] n
- [ω
^{i+1}(α+1)] n = [ω^{i+1}α+ω^{i}n] n

### Lemmas:

- 2 H 2 [ω i] n = H H2 [ω i] n =
^{(Def. 1)}n H2 [ω i] n =^{(n x Def. 4)}[ω i+n] n =^{(Def. 5)}[ω(i+1)] n - 3 H 2 [ω
^{2}i] n = H (2 H 2) [ω^{2}i] n =^{(Def. 1)}n (2 H 2) [ω^{2}i] n =^{(n x Lemma 1)}[ω^{2}i+ω n] n =^{(Def. 6)}[ω^{2}(i+1)] n - 4 H 2 [0] n = H (3 H 2) [0] n =
^{(Def. 1)}n (3 H 2) [0] n =^{(n x Lemma 2)}[ω^{2}n] n =^{(Def 6)}[ω^{3}] n

Lemma 3 gives wCubed = 2 2 H 2 2 2 = 4 H 2 [0] 2 = [ω^{3}] 2. In comparison,
Graham’s number is known to be less than the much much smaller [ω+1] 64. As it
turns out, this proof becomes almost trivial in our custom hierarchy. We start
with defining Graham’s number as a Church numeral, exploiting the fact that in
Knuth’s up-arrow notation,
3 ↑ n = 3^{n} = upify (mult 3) n, and 3 ↑^{k+1} n = (3 ↑^{k}
)^{n-1} 3 = (3 ↑^{k} )^{n-1} (3 ↑^{k} 1) = (3 ↑^{k} )^{n} 1:

### Definitions:

- mult a b f = a (b f)
- upify f n = n f 1
- g n = n upify (mult 3) 3
- Graham = 64 g 4

### Lemmas (assuming n ≥ 3):

- times 3 n ≤ n
^{2}= [0] n - upify [α] n = n [α] 1 < 2 n [α] 1 = [α+1] n
- g n = n upify (times 3) 3 ≤
^{(Lemma 1)}n upify [0] 3 <^{(Lemma 2)}f_{n}n = [ω] n

By Lemma 3, Graham = 64 g 4 < 64 [ω] 64 = [ω+1] 64

## A Functional Busy Beaver

Based on the λ-calculus, I recently added to OEIS a
functional Busy Beaver function BB_{λ} that,
besides greater simplicity, has the advantage of
measuring program size in bits rather than states. Note how, similar to BB_{TM}(),
the value of BB_{λ}() is not the program output considered as a number itself, but
rather the output size. And in case of binary λ-calculus, the size of a Church numeral n is 5n+6.
The first unknown BB_{TM} is at 5 states, while the first unknown BB_{λ} is at 37 bits.

The growth rates of the two BB functions may be compared by how quickly they exceed
that most famous of large numbers: Graham’s number.
The current best effort for BB_{TM}, after many rounds of optimization,
is stuck at 16 states,
weighing in at over 16*2*(2+4) = 192 bits. That compares rather unfavorably with our 63 bits.

The existence of a 29 bit Ackermann-like function and a 79 bit function growing too fast to be provably total in Peano Arithmetic, also have no parallels in the realm of Turing machines, suggesting that the λ-calculus exhibits faster growth.

It further enjoys massive advantages in programmability. Modern high level pure functional languages like Haskell are essentially just syntactically sugared λ-calculus, with programmer friendly features like Algebraic Data Types translating directly through Scott encodings. The bruijn programming language is an even thinner layer of syntactic sugar for the pure untyped lambda calculus, whose extensive standard library contains many datatypes and functions. It is this excellent programmability of the λ-calculus that facilitated the creation of wCubed.

In contrast, programming a Turing machine has been called impossibly tedious, which explains why people have resorted to implementing higher level languages like Not-Quite-Laconic for writing nontrivial programs that don’t waste too many states.

In his paper The Busy Beaver Frontier, Scott Aaronson tries to answer the question

## But why Turing machines?

```
For all their historic importance, haven’t Turing machines been completely superseded
by better alternatives—whether stylized assembly languages or various codegolf languages or Lisp?
As we’ll see, there is a reason why Turing machines were a slightly unfortunate choice
for the Busy Beaver game: namely, the loss incurred when we encode a state transition table
by a string of bits or vice versa.
But Turing machines also turn out to have a massive advantage that compensates for this.
Namely, because Turing machines have no “syntax” to speak of, but only graph structure,
we immediately start seeing interesting behavior even with machines of only 3, 4, or 5 states,
which are feasible to enumerate.
And there’s a second advantage. Precisely because the Turing machine model is so ancient and fixed,
whatever emergent behavior we find in the Busy Beaver game, there can be no suspicion that
we “cheated” by changing the model until we got the results we wanted.
In short, the Busy Beaver game seems like about as good a yardstick as any for gauging humanity’s
progress against the uncomputable
```

The claimed advantages for the “slightly unfortunate choice” do not hold over that even more ancient model of the λ-calculus, while the latter’s relatively straightforward binary encoding make it a preferable yardstick for exploring the limits of computation. The real question then is “Why not λ-calculus?”, the answer to which appears to be rooted in historical accident more than anything.

## A Universal Busy Beaver

Is BB_{λ} then an ideal Busy Beaver function (apart from a historical lack of study)?
Not quite. It’s still lacking one desirable property, namely universality.

This property mirrors a notion of optimality for shortest description lengths, where it’s known as the Invariance theorem:

Given any description language L, the optimal description language is at least as efficient as L, with some constant overhead.

In the realm of beavers, this means that given any Busy Beaver function BB (based on self-delimiting programs), an optimal Busy Beaver surpasses it with at most constant lag:

for some constant c depending on BB, and for all n: BB_{opt}(n+c) ≥ BB(n)

While BB_{λ} is not universal, it’s not far from one either.
By giving λ-calculus terms access to pure binary data, as in the Binary Lambda Calculus,
function BB_{BLC} achieves universality
while lagging only 2 bits behind BB_{λ}.
It’s known to eventually outgrow the latter, but that could take thousands of bits.

Besides having a somewhat more complicated definition, and being somewhat harder to analyze,
BB_{BLC} has one other downside: it doesn’t represent the ginormous wCubed in 64 bits…