The largest number representable in 64 bits

24 Nov 2023

┬─┬─────────┬─┬─┬ ┬─┬──
└─┤ ──┬──── │ │ │ ┼─┼─┬
  │ ──┼─┬── │ │ │ │ ├─┘
  │ ┬─┼─┼─┬ │ │ │ ├─┘  
  │ └─┤ │ │ │ │ │ │    
  │   └─┤ │ │ │ │ │    
  │     ├─┘ │ │ │ │    
  └─────┤   │ │ │ │    
        └───┤ │ │ │    
            └─┤ │ │    
              └─┤ │    
                └─┘    

Most people believe 264-1 = 18446744073709551615, or 0xFFFFFFFFFFFFFFFF in hexadecimal, to be the largest number representable in 64 bits. In English, it’s quite the mouthful: eighteen quintillion four hundred forty-six quadrillion seven hundred forty-four trillion seventy-three billion seven hundred nine million five hundred fifty-one thousand six hundred fifteen.

That is indeed the maximum possible value of 64 bit unsigned integers, available as datatype uint64_t in C or u64 in Rust. We can easily surpass this with floating point numbers. The 64-bit double floating point format has a largest (finite) representable value of 21024(1-2-53) ~ 1.8*10308.

What if we allow representations beyond plain datatypes? Such as a program small enough to fit in 64 bits. For most programming languages, there is very little you can do in a mere 8 bytes. In C that only leaves you with the nothingness of “main(){}”.

But there are plenty languages that require no such scaffolding. For instance, on Linux there is arbitrary precision calculator “bc”. It happily computes the 954242 digit number 9^999999 = 35908462…48888889, which can thus be said to be representable in 64 bits. Had it supported the symbol ! for computing factorials, then 9!!!!!!! would make a much larger number representable in 64 bits.

Allowing such primitives feels a bit like cheating though. Would we allow a language that has the Ackerman function predefined, which sports the 8 byte expression ack(9,9) representing a truly huge number?

No primitives needed

As it turns out, the question is moot. There are simple languages with no built in primitives. Not even basic arithmetic. Not even numbers themselves. Languages in which all those must be defined from scratch. One such language allows us to blow way past ack(9,9) in under 64 bits.

But let’s first look at another such language, one that has been particularly well studied for producing largest possible outputs. That is the language of Turing machines.

Busy Beaver

The famous Busy Beaver function, introduced by Tibor Radó in 1962, which we’ll denote BBTM(n), is defined as the maximum number of 1s that can be written with an n state Turing machine starting from an all 0 tape before halting. Note that if we consider this output as a number M written in binary, then it only gets credited for its length, which is log2(M+1).

In 64 bits, one can fully specify a 6 state binary Turing machine, or TM for short. For each of its internal states and each of its 2 tape symbols, one can specify what new tape symbol it should write in the currently scanned tape cell, whether to move the tape head left or right, and what new internal state, or special halt state, to transition to. That takes 6*2*(2+⌈log2(6+1)⌉) = 60 bits. Just how big an output can a 6 state TM produce?

The best known result for 6 states is BBTM(6) > 10↑↑15, which denotes an exponential tower of fifteen 10s. Clearly, in this notation there’s not that much difference between a number and its size in bits. Large as this number is, it’s still pathetically small compared to even ack(5,5), which no known TM of less than 10 states—amounting to 110 bits of description—can surpass.

For that, we need to move beyond Turing machines, into the language of

Lambda Calculus

Alonzo Church conceived the λ-calculus in about 1928 as a formal logic system for expressing computation based on function abstraction and application using variable binding and substitution.

A tiny 63 bit program in this language represents a number unfathomably larger than not only ack(9,9), but the far larger Graham’s Number as well. It originates in a Code Golf challenge asking for the “Shortest terminating program whose output size exceeds Graham’s number”, answered by user Patcail and further optimized by user 2014MELO03. With one final optimization applied, the following 63 bit program

01 00 01 01 01 01 01 10 10 00 00 00 01 01 01 10 1110 110 10 10 10 10 00 00 01 110 01 110 10

is the Binary Lambda Calculus encoding of the term

(λ 1 1 (λ λ λ 1 3 2 1) 1 1 1) (λ λ 2 (2 1))

where λ (lambda) denotes an anonymous function, and number i is the variable bound by the i-th nested λ. This is known as De Bruijn notation, a way to avoid naming variables. A more conventional notation using variable names would be

(λt. t t (λh λf λn. n h f n) t t t) (λf λx. f (f x))

The top of this post shows a graphical representation of the term. The last 16 bits of the program—making up more than a quarter of its size—encodes the term λf λx. f (f x), which takes arguments f and x in turn, and iterates f twice on x. In general, the function that iterates a given function n times on a given argument is called Church numeral n, and is the standard way of representing numbers in the λ-calculus. The program, which we’ll name after its underlying growth rate, can be expressed more legibly as

wCubed = let { 2 = λf λx. f (f x); H = λh λf λn. n h f n } in 2 2 H 2 2 2

The next section is mostly for the benefit of readers familiar with ordinal arithmetic, and is probably better skipped by others.

Proof of exceeding Graham’s Number

Following the great suggestion of Googology user “BMS is not well-founded”, let us start by defining a wCubed-customized Fast-growing hierarchy, a family that assigns, to each ordinal α, a function [α] (diverting from the usual fα notation for improved legibility) from natural numbers to natural numbers. We’ll treat all numbers as Church Numerals, so we can write n f instead of the usual fn and write f n instead of f(n) as normally done in λ-calculus.

Definitions:

  1. H h f n = n h f n
  2. H2 = H 2
  3. [0] n = 2 n = n2
  4. [α+1] n = n 2 [α] n = H 2 [α] n
  5. [ωα+ω] n = [ωα+n] n
  6. i+1(α+1)] n = [ωi+1α+ωi n] n

Lemmas:

  1. 2 H 2 [ω i] n = H H2 [ω i] n =(Def. 1) n H2 [ω i] n =(n x Def. 4) [ω i+n] n =(Def. 5) [ω(i+1)] n
  2. 3 H 2 [ω2i] n = H (2 H 2) [ω2i] n =(Def. 1) n (2 H 2) [ω2i] n =(n x Lemma 1)2i+ω n] n =(Def. 6)2(i+1)] n
  3. 4 H 2 [0] n = H (3 H 2) [0] n =(Def. 1) n (3 H 2) [0] n =(n x Lemma 2)2n] n =(Def 6)3] n

Lemma 3 gives wCubed = 2 2 H 2 2 2 = 4 H 2 [0] 2 = [ω3] 2. In comparison, Graham’s number is known to be less than the much much smaller [ω+1] 64. As it turns out, this proof becomes almost trivial in our custom hierarchy. We start with defining Graham’s number as a Church numeral, exploiting the fact that in Knuth’s up-arrow notation, 3 ↑ n = 3n = upify (mult 3) n, and 3 ↑k+1 n = (3 ↑k )n-1 3 = (3 ↑k )n-1 (3 ↑k 1) = (3 ↑k )n 1:

Definitions:

Lemmas (assuming n ≥ 3):

  1. times 3 n ≤ n2 = [0] n
  2. upify [α] n = n [α] 1 < 2 n [α] 1 = [α+1] n
  3. g n = n upify (times 3) 3 ≤(Lemma 1) n upify [0] 3 <(Lemma 2) fn n = [ω] n

By Lemma 3, Graham = 64 g 4 < 64 [ω] 64 = [ω+1] 64

A Functional Busy Beaver

Based on the λ-calculus, I recently added to OEIS a functional Busy Beaver function BBλ that, besides greater simplicity, has the advantage of measuring program size in bits rather than states. Note how, similar to BBTM(), the value of BBλ() is not the program output considered as a number itself, but rather the output size. And in case of binary λ-calculus, the size of a Church numeral n is 5n+6. The first unknown BBTM is at 5 states, while the first unknown BBλ is at 37 bits.

The growth rates of the two BB functions may be compared by how quickly they exceed that most famous of large numbers: Graham’s number. The current best effort for BBTM, after many rounds of optimization, is stuck at 16 states, weighing in at over 16*2*(2+4) = 192 bits. That compares rather unfavorably with our 63 bits.

The existence of a 29 bit Ackermann-like function and a 79 bit function growing too fast to be provably total in Peano Arithmetic, also have no parallels in the realm of Turing machines, suggesting that the λ-calculus exhibits faster growth.

It further enjoys massive advantages in programmability. Modern high level pure functional languages like Haskell are essentially just syntactically sugared λ-calculus, with programmer friendly features like Algebraic Data Types translating directly through Scott encodings. The bruijn programming language is an even thinner layer of syntactic sugar for the pure untyped lambda calculus, whose extensive standard library contains many datatypes and functions. It is this excellent programmability of the λ-calculus that facilitated the creation of wCubed.

In contrast, programming a Turing machine has been called impossibly tedious, which explains why people have resorted to implementing higher level languages like Not-Quite-Laconic for writing nontrivial programs that don’t waste too many states.

In his paper The Busy Beaver Frontier, Scott Aaronson tries to answer the question

But why Turing machines?

For all their historic importance, haven’t Turing machines been completely superseded
by better alternatives—whether stylized assembly languages or various codegolf languages or Lisp?
As we’ll see, there is a reason why Turing machines were a slightly unfortunate choice
for the Busy Beaver game: namely, the loss incurred when we encode a state transition table
by a string of bits or vice versa.
But Turing machines also turn out to have a massive advantage that compensates for this.
Namely, because Turing machines have no “syntax” to speak of, but only graph structure,
we immediately start seeing interesting behavior even with machines of only 3, 4, or 5 states,
which are feasible to enumerate.
And there’s a second advantage. Precisely because the Turing machine model is so ancient and fixed,
whatever emergent behavior we find in the Busy Beaver game, there can be no suspicion that
we “cheated” by changing the model until we got the results we wanted.
In short, the Busy Beaver game seems like about as good a yardstick as any for gauging humanity’s
progress against the uncomputable

The claimed advantages for the “slightly unfortunate choice” do not hold over that even more ancient model of the λ-calculus, while the latter’s relatively straightforward binary encoding make it a preferable yardstick for exploring the limits of computation. The real question then is “Why not λ-calculus?”, the answer to which appears to be rooted in historical accident more than anything.

A Universal Busy Beaver

Is BBλ then an ideal Busy Beaver function (apart from a historical lack of study)? Not quite. It’s still lacking one desirable property, namely universality.

This property mirrors a notion of optimality for shortest description lengths, where it’s known as the Invariance theorem:

Given any description language L, the optimal description language is at least as efficient as L, with some constant overhead.

In the realm of beavers, this means that given any Busy Beaver function BB (based on self-delimiting programs), an optimal Busy Beaver surpasses it with at most constant lag:

for some constant c depending on BB, and for all n: BBopt(n+c) ≥ BB(n)

While BBλ is not universal, it’s not far from one either. By giving λ-calculus terms access to pure binary data, as in the Binary Lambda Calculus, function BBBLC achieves universality while lagging only 2 bits behind BBλ. It’s known to eventually outgrow the latter, but that could take thousands of bits.

Besides having a somewhat more complicated definition, and being somewhat harder to analyze, BBBLC has one other downside: it doesn’t represent the ginormous wCubed in 64 bits…