John Tromp's Blog2024-03-27T08:57:06+00:00http://tromp.github.io/blogJohn Trompjohn.tromp@gmail.comThe largest number representable in 64 bits2023-11-24T00:00:00+00:00http://tromp.github.io/blog/2023/11/24/largest-number<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┬─┬─────────┬─┬─┬ ┬─┬──
└─┤ ──┬──── │ │ │ ┼─┼─┬
│ ──┼─┬── │ │ │ │ ├─┘
│ ┬─┼─┼─┬ │ │ │ ├─┘
│ └─┤ │ │ │ │ │ │
│ └─┤ │ │ │ │ │
│ ├─┘ │ │ │ │
└─────┤ │ │ │ │
└───┤ │ │ │
└─┤ │ │
└─┤ │
└─┘
</code></pre></div></div>
<p>Most people believe 2<sup>64</sup>-1 = 18446744073709551615, or 0xFFFFFFFFFFFFFFFF in hexadecimal,
to be the largest number representable in 64 bits. In English, it’s quite the mouthful:
eighteen quintillion four hundred forty-six quadrillion seven hundred forty-four
trillion seventy-three billion seven hundred nine million five hundred fifty-one
thousand six hundred fifteen.</p>
<p>That is indeed the maximum possible value of 64 bit unsigned integers,
available as datatype uint64_t in C or u64 in Rust.
We can easily surpass this with floating point numbers. The 64-bit
<a href="https://en.wikipedia.org/wiki/Double-precision_floating-point_format">double floating point format</a>
has a largest (finite) representable value of
2<sup>1024</sup>(1-2<sup>-53</sup>) ~ 1.8*10<sup>308</sup>.</p>
<p>What if we allow representations beyond plain datatypes?
Such as a program small enough to fit in 64 bits.
For most programming languages, there is very little you can do in a mere 8 bytes.
In C that only leaves you with the nothingness of “main(){}”.</p>
<p>But there are plenty languages that require no such scaffolding. For instance,
on Linux there is arbitrary precision calculator “bc”. It happily
computes the 954242 digit number 9^999999 = 35908462…48888889, which can thus
be said to be representable in 64 bits. Had it supported the symbol ! for computing factorials,
then 9!!!!!!! would make a much larger number representable in 64 bits.</p>
<p>Allowing such primitives feels a bit like cheating though. Would we allow a
language that has the <a href="https://en.wikipedia.org/wiki/Ackermann_function">Ackerman function</a>
predefined, which sports the 8 byte expression ack(9,9) representing a truly huge number?</p>
<h2 id="no-primitives-needed">No primitives needed</h2>
<p>As it turns out, the question is moot. There are simple languages with no built
in primitives. Not even basic arithmetic. Not even numbers themselves.
Languages in which all those must be defined from scratch. One such
language allows us to blow way past ack(9,9) in under 64 bits.</p>
<p>But let’s first look at another such language, one that has been particularly well studied for
producing largest possible outputs. That is the language of
<a href="https://en.wikipedia.org/wiki/Turing_machine">Turing machines</a>.</p>
<h2 id="busy-beaver">Busy Beaver</h2>
<p>The famous <a href="https://en.wikipedia.org/wiki/Busy_beaver">Busy Beaver</a>
function, <a href="https://archive.org/details/bstj41-3-877/mode/2up">introduced</a> by
<a href="https://en.wikipedia.org/wiki/Tibor_Rad%C3%B3">Tibor Radó</a> in 1962, which we’ll
denote BB<sub>TM</sub>(n), is defined as the maximum number of 1s that can be written with
an n state Turing machine starting from an all 0 tape before halting. Note that if we
consider this output as a number M written in binary, then it only gets
credited for its length, which is log<sub>2</sub>(M+1).</p>
<p>In 64 bits, one can fully specify a 6 state binary Turing machine, or TM for
short. For each of its internal states and each of its 2 tape symbols, one can
specify what new tape symbol it should write in the currently scanned tape
cell, whether to move the tape head left or right, and what new internal state,
or special halt state, to transition to. That takes 6*2*(2+⌈log2(6+1)⌉) = 60 bits.
Just how big an output can a 6 state TM produce?</p>
<p>The best known result for 6 states is
<a href="https://www.sligocki.com/2022/06/21/bb-6-2-t15.html">BB<sub>TM</sub>(6) > 10↑↑15</a>,
which denotes an exponential tower of fifteen 10s. Clearly, in this notation there’s not that
much difference between a number and its size in bits.
Large as this number is, it’s still pathetically small compared to even
ack(5,5), which no known TM of less than 10 states—amounting to 110 bits of
description—can surpass.</p>
<p>For that, we need to move beyond Turing machines, into the language of</p>
<h2 id="lambda-calculus">Lambda Calculus</h2>
<p>Alonzo Church conceived the <a href="https://en.wikipedia.org/wiki/Lambda_calculus">λ-calculus</a>
in about 1928 as a formal logic system for expressing
computation based on function abstraction and application using variable binding and substitution.</p>
<p>A tiny 63 bit program in this language represents a number unfathomably larger than not only ack(9,9),
but the far larger <a href="https://en.wikipedia.org/wiki/Graham%27s_number">Graham’s Number</a> as well.
It originates in a Code Golf challenge asking for the
“Shortest terminating program whose output size exceeds Graham’s number”,
<a href="https://codegolf.stackexchange.com/questions/6430/shortest-terminating-program-whose-output-size-exceeds-grahams-number/219734#219734">answered</a>
by user <a href="https://codegolf.stackexchange.com/users/101119/patcail">Patcail</a> and
<a href="https://codegolf.stackexchange.com/questions/6430/shortest-terminating-program-whose-output-size-exceeds-grahams-number/219734#comment533337_219734">further optimized</a> by user
<a href="https://codegolf.stackexchange.com/users/98257/2014melo03">2014MELO03</a>.
With one final optimization applied, the following 63 bit program</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>01 00 01 01 01 01 01 10 10 00 00 00 01 01 01 10 1110 110 10 10 10 10 00 00 01 110 01 110 10
</code></pre></div></div>
<p>is the <a href="https://tromp.github.io/cl/cl.html">Binary Lambda Calculus</a> <a href="https://gist.github.com/tromp/86b3184f852f65bfb814e3ab0987d861#lambda-encoding">encoding</a> of the term</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(λ 1 1 (λ λ λ 1 3 2 1) 1 1 1) (λ λ 2 (2 1))
</code></pre></div></div>
<p>where λ (lambda) denotes an anonymous function, and number i is the variable bound by the i-th nested λ.
This is known as <a href="https://en.wikipedia.org/wiki/De_Bruijn_notation">De Bruijn notation</a>, a
way to avoid naming variables. A more conventional notation using variable names would be</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(λt. t t (λh λf λn. n h f n) t t t) (λf λx. f (f x))
</code></pre></div></div>
<p>The top of this post shows a <a href="https://tromp.github.io/cl/diagrams.html">graphical representation</a> of the term.
The last 16 bits of the program—making up more than a quarter of its size—encodes
the term λf λx. f (f x), which takes arguments f and x in turn, and iterates f twice on x.
In general, the function that iterates a given function n times on a given argument
is called Church numeral n, and is the standard way of representing numbers in the λ-calculus.
The program, which we’ll name after its underlying growth rate, can be expressed more legibly as</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>wCubed = let { 2 = λf λx. f (f x); H = λh λf λn. n h f n } in 2 2 H 2 2 2
</code></pre></div></div>
<p>The next section is mostly for the benefit of readers familiar with
<a href="https://en.wikipedia.org/wiki/Ordinal_number">ordinal</a>
<a href="https://en.wikipedia.org/wiki/Ordinal_arithmetic">arithmetic</a>,
and is probably better skipped by others.</p>
<h2 id="proof-of-exceeding-grahams-number">Proof of exceeding Graham’s Number</h2>
<p>Following the great suggestion of Googology user “BMS is not well-founded”, let
us start by defining a wCubed-customized
<a href="https://en.wikipedia.org/wiki/Fast-growing_hierarchy">Fast-growing hierarchy</a>, a family that
assigns, to each ordinal α, a function [α] (diverting from the usual f<sub>α</sub>
notation for improved legibility) from natural numbers to natural numbers.
We’ll treat all numbers as Church Numerals, so we can write n f instead of the
usual f<sup>n</sup> and write f n instead of f(n) as normally done in λ-calculus.</p>
<h3 id="definitions">Definitions:</h3>
<ol>
<li>H h f n = n h f n</li>
<li>H2 = H 2</li>
<li>[0] n = 2 n = n<sup>2</sup></li>
<li>[α+1] n = n 2 [α] n = H 2 [α] n</li>
<li>[ωα+ω] n = [ωα+n] n</li>
<li>[ω<sup>i+1</sup>(α+1)] n = [ω<sup>i+1</sup>α+ω<sup>i</sup> n] n</li>
</ol>
<h3 id="lemmas">Lemmas:</h3>
<ol>
<li>2 H 2 [ω i] n = H H2 [ω i] n =<sup>(Def. 1)</sup> n H2 [ω i] n =<sup>(n x Def. 4)</sup> [ω i+n] n =<sup>(Def. 5)</sup> [ω(i+1)] n</li>
<li>3 H 2 [ω<sup>2</sup>i] n = H (2 H 2) [ω<sup>2</sup>i] n =<sup>(Def. 1)</sup> n (2 H 2) [ω<sup>2</sup>i] n =<sup>(n x Lemma 1)</sup> [ω<sup>2</sup>i+ω n] n =<sup>(Def. 6)</sup> [ω<sup>2</sup>(i+1)] n</li>
<li>4 H 2 [0] n = H (3 H 2) [0] n =<sup>(Def. 1)</sup> n (3 H 2) [0] n =<sup>(n x Lemma 2)</sup> [ω<sup>2</sup>n] n =<sup>(Def 6)</sup> [ω<sup>3</sup>] n</li>
</ol>
<p>Lemma 3 gives wCubed = 2 2 H 2 2 2 = 4 H 2 [0] 2 = [ω<sup>3</sup>] 2. In comparison,
Graham’s number is known to be less than the much much smaller [ω+1] 64. As it
turns out, this proof becomes almost trivial in our custom hierarchy. We start
with defining Graham’s number as a Church numeral, exploiting the fact that in
<a href="https://en.wikipedia.org/wiki/Knuth%27s_up-arrow_notation">Knuth’s up-arrow notation</a>,
3 ↑ n = 3<sup>n</sup> = upify (mult 3) n, and 3 ↑<sup>k+1</sup> n = (3 ↑<sup>k</sup>
)<sup>n-1</sup> 3 = (3 ↑<sup>k</sup> )<sup>n-1</sup> (3 ↑<sup>k</sup> 1) = (3 ↑<sup>k</sup> )<sup>n</sup> 1:</p>
<h3 id="definitions-1">Definitions:</h3>
<ul>
<li>mult a b f = a (b f)</li>
<li>upify f n = n f 1</li>
<li>g n = n upify (mult 3) 3</li>
<li>Graham = 64 g 4</li>
</ul>
<h3 id="lemmas-assuming-n--3">Lemmas (assuming n ≥ 3):</h3>
<ol>
<li>times 3 n ≤ n<sup>2</sup> = [0] n</li>
<li>upify [α] n = n [α] 1 < 2 n [α] 1 = [α+1] n</li>
<li>g n = n upify (times 3) 3 ≤<sup>(Lemma 1)</sup> n upify [0] 3 <<sup>(Lemma 2)</sup> f<sub>n</sub> n = [ω] n</li>
</ol>
<p>By Lemma 3, Graham = 64 g 4 < 64 [ω] 64 = [ω+1] 64</p>
<h2 id="a-functional-busy-beaver">A Functional Busy Beaver</h2>
<p>Based on the λ-calculus, I recently added to OEIS a
<a href="https://oeis.org/A333479">functional Busy Beaver function</a> BB<sub>λ</sub> that,
besides greater simplicity, has the advantage of
measuring program size in bits rather than states. Note how, similar to BB<sub>TM</sub>(),
the value of BB<sub>λ</sub>() is not the program output considered as a number itself, but
rather the output size. And in case of binary λ-calculus, the size of a Church numeral n is 5n+6.
The first unknown BB<sub>TM</sub> is at 5 states, while the first unknown BB<sub>λ</sub> is at 37 bits.</p>
<p>The growth rates of the two BB functions may be compared by how quickly they exceed
that most famous of large numbers: Graham’s number.
The current best effort for BB<sub>TM</sub>, after many rounds of optimization,
is <a href="https://googology.fandom.com/wiki/Busy_beaver_function#Small_values">stuck at 16 states</a>,
weighing in at over 16*2*(2+4) = 192 bits. That compares rather unfavorably with our 63 bits.</p>
<p>The existence of a <a href="https://mathoverflow.net/questions/353514/whats-the-smallest-lambda-calculus-term-not-known-to-have-a-normal-form">29 bit Ackermann-like function</a>
and a <a href="https://github.com/tromp/AIT/blob/master/fast_growing_and_conjectures/E0.lam">79 bit function</a>
growing too fast to be provably total in Peano Arithmetic,
also have no parallels in the realm of Turing machines, suggesting that the λ-calculus exhibits faster growth.</p>
<p>It further enjoys massive advantages in programmability.
Modern high level pure functional languages like <a href="https://www.haskell.org/">Haskell</a>
are essentially just syntactically sugared λ-calculus,
with programmer friendly features like <a href="https://en.wikipedia.org/wiki/Algebraic_data_type">Algebraic Data Types</a>
translating directly through <a href="https://en.wikipedia.org/wiki/Mogensen%E2%80%93Scott_encoding">Scott encodings</a>.
The <a href="https://bruijn.marvinborner.de/">bruijn programming language</a> is an even
thinner layer of syntactic sugar for the pure untyped lambda calculus, whose
extensive <a href="https://bruijn.marvinborner.de/std/">standard library</a> contains many
datatypes and functions.
It is this excellent programmability of the λ-calculus that facilitated the creation of wCubed.</p>
<p>In contrast, programming a Turing machine has been called impossibly tedious,
which explains why people have resorted to implementing higher level languages like
<a href="https://github.com/sorear/metamath-turing-machines">Not-Quite-Laconic</a>
for writing nontrivial programs that don’t waste too many states.</p>
<p>In his paper <a href="https://scottaaronson.com/papers/bb.pdf">The Busy Beaver Frontier</a>, <a href="https://scottaaronson.com/">Scott Aaronson</a> tries to answer the question</p>
<h2 id="but-why-turing-machines">But why Turing machines?</h2>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>For all their historic importance, haven’t Turing machines been completely superseded
by better alternatives—whether stylized assembly languages or various codegolf languages or Lisp?
As we’ll see, there is a reason why Turing machines were a slightly unfortunate choice
for the Busy Beaver game: namely, the loss incurred when we encode a state transition table
by a string of bits or vice versa.
But Turing machines also turn out to have a massive advantage that compensates for this.
Namely, because Turing machines have no “syntax” to speak of, but only graph structure,
we immediately start seeing interesting behavior even with machines of only 3, 4, or 5 states,
which are feasible to enumerate.
And there’s a second advantage. Precisely because the Turing machine model is so ancient and fixed,
whatever emergent behavior we find in the Busy Beaver game, there can be no suspicion that
we “cheated” by changing the model until we got the results we wanted.
In short, the Busy Beaver game seems like about as good a yardstick as any for gauging humanity’s
progress against the uncomputable
</code></pre></div></div>
<p>The claimed advantages for the “slightly unfortunate choice” do not hold over that even more
ancient model of the λ-calculus, while the latter’s relatively straightforward binary encoding make
it a preferable yardstick for exploring the limits of computation. The real question then is
“Why not λ-calculus?”, the answer to which appears to be rooted in historical accident more than anything.</p>
<h2 id="a-universal-busy-beaver">A Universal Busy Beaver</h2>
<p>Is BB<sub>λ</sub> then an ideal Busy Beaver function (apart from a historical lack of study)?
Not quite. It’s still lacking one desirable property, namely universality.</p>
<p>This property mirrors a notion of optimality for shortest description lengths, where it’s known
as the <a href="https://en.wikipedia.org/wiki/Kolmogorov_complexity#Invariance_theorem">Invariance theorem</a>:</p>
<p>Given any description language L, the optimal description language is at least as efficient as L, with some constant overhead.</p>
<p>In the realm of beavers, this means that given any Busy Beaver function BB
(based on self-delimiting programs), an optimal Busy Beaver surpasses it with
at most constant lag:</p>
<p>for some constant c depending on BB, and for all n: BB<sub>opt</sub>(n+c) ≥ BB(n)</p>
<p>While BB<sub>λ</sub> is not universal, it’s not far from one either.
By giving λ-calculus terms access to pure binary data, as in the Binary Lambda Calculus,
function <a href="https://oeis.org/A361211">BB<sub>BLC</sub></a> achieves universality
while lagging only 2 bits behind BB<sub>λ</sub>.
It’s known to eventually outgrow the latter, but that could take thousands of bits.</p>
<p>Besides having a somewhat more complicated definition, and being somewhat harder to analyze,
BB<sub>BLC</sub> has one other downside: it doesn’t represent the ginormous wCubed in 64 bits…</p>
John adds blog2023-11-20T00:00:00+00:00http://tromp.github.io/blog/2023/11/20/blog-launched<p>Well. Finally got around to putting this blog site together. Using these <a href="https://easyperf.net/guides/github-pages">helpful instructions</a> by <a href="https://easyperf.net/">Denis Bakhvalov</a>.</p>
<p>View <a href="http://tromp.github.io/blog/atom.xml">my feed</a>.</p>
Flip Puzzles and Linear Algebra2021-12-18T00:00:00+00:00http://tromp.github.io/blog/2021/12/18/flip-puzzles<p>This post discusses how to solve flip puzzles like Simon Tatham’s “Flip” [1] or David Johnson-Davies’s 16 LEDs puzzle [2], where each cell in a rectangular grid represents a light that can be either on or off, and the goal is to have all lights turned on.</p>
<p>The challenge is that, while each light can be flipped by clicking on its cell (or pressing a button directly below the LED), this action also flips several other lights in a certain pattern.</p>
<p>Because the LED puzzle uses diagonals through a cell as patterns, this decomposes the 16 LEDs puzzle into 2 disjoint 8 LEDs puzzles:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>_ A _ B H _ I _
C _ D _ _ J _ K
_ E _ F L _ M _
B _ G _ _ N _ O
</code></pre></div></div>
<p>according to the light and dark squares in a checkerboard pattern. Let’s focus on the 8 LEDs puzzle on the left, with the 8 lights marked as A..G.
If we denote the corresponding switches in lowercase a..g, then the behavior of light A can be described as A = a + c + d + f, since pressing any of these 4 switches will flip light A, and the other switches have no effect on A. A switch takes on value 0 if not pressed, and 1 if pressed. Furthermore, since flipping twice is the same as not flipping, we add modulo 2, where 1 + 1 = 0. Mathematicians call this number system GF(2). The behavior of all 8 light can now be expressed in a single matrix equation</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> A ( 1 0 1 1 0 1 0 ) a
B ( 0 1 0 1 1 0 0 ) b
C ( 1 0 1 0 1 0 1 ) c
D = ( 1 1 0 1 1 1 0 ) d
E ( 0 1 1 1 1 0 1 ) e
F ( 1 0 0 1 0 1 1 ) f
G ( 0 0 1 0 1 1 1 ) g
</code></pre></div></div>
<p>This matrix over GF(2) is not invertible; its rank is only 5, so the lights cannot be changed arbitrarily. By Gaussian elimination [3], one finds that switch f does the same as switches a+b+e, and button g the same as b+c+d. This means there’s no need to use these switches (f=g=0) and the matrix equation simplifies to</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>A ( 1 0 1 1 0 ) a
B ( 0 1 0 1 1 ) b
C = ( 1 0 1 0 1 ) c
D ( 1 1 0 1 1 ) d
E ( 0 1 1 1 1 ) e
</code></pre></div></div>
<p>Using Linear Algebra, we can invert the matrix [4] and express the switch values in terms of the light values:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>a ( 0 1 0 1 0 ) A
b ( 1 1 1 0 0 ) B
c = ( 0 1 0 0 1 ) C
d ( 1 0 0 1 1 ) D
e ( 0 0 1 1 1 ) E
</code></pre></div></div>
<p>So if the puzzle has only lights A and E off, and we want to know what switches to press to flip these two lights and no others, then we compute the sum of the 1st and 5th column of the matrix:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0 ( 0 1 0 1 0 ) 1
1 ( 1 1 1 0 0 ) 0
1 = ( 0 1 0 0 1 ) 0
0 ( 1 0 0 1 1 ) 0
1 ( 0 0 1 1 1 ) 1
</code></pre></div></div>
<p>to find that switches b, c, and e must be pressed. Indeed:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>_ 0 _ 1 _ 1 _ 0 _ 0 _ 1 _ 1 _ 0
0 _ 1_ 1 _ 0 _ 1 _ 1 _ 0 _ 0 _
_ 1 _ 0 + _ 1 _ 0 + _ 1 _ 0 = _ 1 _ 0
1 _ 0 _ 0 _ 1 _ 1 _ 1 _ 0 _ 0 _
b c e
</code></pre></div></div>
<p>Happy flipping!</p>
<p>[1] <a href="https://www.chiark.greenend.org.uk/~sgtatham/puzzles/js/flip.html">https://www.chiark.greenend.org.uk/~sgtatham/puzzles/js/flip.html</a></p>
<p>[2] <a href="http://www.technoblogy.com/show?3PO0">http://www.technoblogy.com/show?3PO0</a></p>
<p>[3] <a href="https://en.wikipedia.org/wiki/Gaussian_elimination">https://en.wikipedia.org/wiki/Gaussian_elimination</a></p>
<p>[4] <a href="https://en.wikipedia.org/wiki/Gaussian_elimination#Finding_the_inverse_of_a_matrix">https://en.wikipedia.org/wiki/Gaussian_elimination#Finding_the_inverse_of_a_matrix</a></p>
SK numerals2021-11-05T00:00:00+00:00http://tromp.github.io/blog/2021/11/05/sk-numerals<h2 id="church-numerals">Church Numerals</h2>
<p>Church numerals are the standard way of representing natural numbers in the lambda calculus. Cn, the Church numeral for n, iterates a given function n times on a given argument. So we have</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>C0 f x = f⁰ x = x
C1 f x = f¹ x = f x
C2 f x = f² x = f (f x)
</code></pre></div></div>
<p>etcetera. We can define a successor function “Csucc” which iterates a given function one more time than a given numeral does:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Csucc = λn.λf.λx. f (n f x)
</code></pre></div></div>
<p>(λn.λf.λx. n f (f x) works equally well). Each Church numeral Cn is thus itself the n’th iterate of Csucc on C0:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Cn = Cn Csucc C0
</code></pre></div></div>
<p>The best thing about Church numerals is the ease of defining arithmetic operators. It is straightforward to verify that</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>add = λm.λn.λf.λx. m f (n f x)
mul = λm.λn.λf. m (n f)
pow = λm.λn. n m
</code></pre></div></div>
<p>work as advertised.</p>
<p>E.g. pow C2 C3 = C3 C2 = λf. C2 (C2 (C2 f)) = λf.((f²)²)² = λf.f⁸ = C8</p>
<p>Much less straightforward is the predecessor operator, that takes C(n+1) to Cn and leaves C0 unchanged:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Cpred = λn.λf.λx. n(λg.λh. h(g f))(λu. x)(λu. u)
</code></pre></div></div>
<p>Wikipedia tries to explain its operation in detail.</p>
<p>Even less straightforward is the following division operator that springs from the creative mind of Bertram Felgenhauer:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>div = λn.λm.λf.λx.n(λx.λf.f x)(λd.x)(n (λt.m(λx.λf.f x)(λc.f(c t))(λx.x))x)
</code></pre></div></div>
<p>whose operation is illustrated in this Algorithmic Information Theory github repository.</p>
<h2 id="sk-combinatory-logic">SK Combinatory Logic</h2>
<p>Combinatory Logic is concerned with terms consisting of the 2 basic combinators</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>S = λx.λy.λz. x z (y z)
K = λx.λy. x
</code></pre></div></div>
<p>combined through application. It turns out that any closed lambda term is equivalent to a combinator. For example, the identity combinator can be obtained as</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>I = S K K
</code></pre></div></div>
<p>since I x = S K K x = K x (K x) = x.</p>
<p>While reading Stephen Wolfram’s latest book, I came across an interesting alternative to Church numerals for Combinatory Logic:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>F0 = K
F1 = S K
F2 = S (S K)
F3 = S ( S (S K)))
</code></pre></div></div>
<p>etcetera, so S iterated n times on K, or Fn = Cn S K.</p>
<p>This takes economy to an extreme, by using the 2 primitive combinators S and K as the very building blocks of numbers:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>F0 = K
Fsucc = S
</code></pre></div></div>
<p>We have</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>F0 x y = x
F(n+1) x y = S Fn x y = Fn y (x y)
</code></pre></div></div>
<p>so that</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>F1 x y = y
F2 x y = x y
F3 x y = y (x y)
F4 x y = x y (y (x y))
</code></pre></div></div>
<p>etcetera, which can be shown by induction so satisfy the Fibonacci recurrence</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>F(n+2) x y = (Fn x y) (F(n+1) x y)
</code></pre></div></div>
<p>with sum replaced by application! Consequently, there are exactly fib(n) occurrences of variable y in Fn x y.</p>
<p>In honor of this close connection we denote these SK numerals with the letter F . Curiously, F0 and F1 coincide with the standard representations of booleans</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>True = K
False = S K
</code></pre></div></div>
<p>The big question, though, is whether they work as numerals. The ultimate test of that is the ability to convert them back into Church numerals. Wolfram’s book suggests a way to do that by applying an SK numeral to x and y that are both Church numerals, perhaps C3 and C2, which leads to all applications working as powers. For example, F3 C3 C2 = C2 (C3 C2) = C2 C8 = C256. And from that we could work our way back to the 3, resulting in a size 181 conversion combinator.</p>
<p>A more elegant conversion can be obtained by defining predecessor and iszero operators, and using these to count down to zero</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Fpred = λn.λx.λy. n (K y) x
Fiszero = λn. n False I S K
F2C = λn. Fiszero n C0 (Csucc (F2C (pred n)))
</code></pre></div></div>
<p>We may check that</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Fpred F1 x y = F1 (K y) x = x
Fpred F2 x y = F2 (K y) x = K y x = y
</code></pre></div></div>
<p>and by induction, Fpred F(n+1) = Fn holds for all n. In fact, it works so well, it lets us venture into the negative numbers:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>F-1 x y = Fpred F0 x y = F0 (K y) x = K y
F-2 x y = Fpred F-1 x y = F-1 (K y) x = K x
F-3 x y = Fpred F-2 x y = F-2 (K y) x = K (K y)
</code></pre></div></div>
<p>etcetera. And Fsucc works on these negative numerals too.</p>
<p>Fiszero however works on non-negative numerals only, as follows:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Fiszero F0 = F0 False I S K = False S K = K = True
Fiszero F(n+1) False I S K = S Fn False I S K = Fn I I S K = I S K = S K = False
</code></pre></div></div>
<p>Using the optimal Y combinator S S K (S (K (S S (S (S S K)))) K), F2C is combinator of size 69, a huge improvement.</p>
<p>But it turns out we can do better still, courtesy of good old Bertram:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pair = λx.λy. λp. p x y
F2C = λn.(λp. n p(p p) False) (pair (λf.λy.λx. pair f (Csucc y)) C0)
</code></pre></div></div>
<p>This size 60 combinator</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>S (S (K (S S (K (K (S K))))) (S S (S K))) (K (S (S (S K K) (K (S (K (S (K (S (K (S (K (S (K (S (K (S S (K (S (K (S (K (S (K K))) S)) (S (K (S (K S) K))))))) K)) S)) K)) S)) (S (S K K)))) K))) (S K)))
</code></pre></div></div>
<p>applies the SK numeral to x and y that are both pairs of some function and a church numeral. Applying one such pair to another results in the function applied to itself and the two numerals:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(pair f Cm) (pair f Cn) = (pair f Cn) f Cm = f f Cn Cm
</code></pre></div></div>
<p>so all function f has to do is take the successor of the right Church numeral and wrap it back up into a pair. In the F2C definition above, p is a wrapped C0 and (p p) becomes a wrapped C1.</p>
<p>And that wraps us our brief exploration of SK numerals.</p>
A case for using soft total supply.2020-12-20T00:00:00+00:00http://tromp.github.io/blog/2020/12/20/soft-supply<p>The cryptocurrency space features endless debates about the pros and cons of various possible emission curves. Prime among them is question of whether supply should be capped (finite) or uncapped (infinite). But is this really an essential difference?</p>
<p>It doesn’t take much to change a finite supply into an infinite one. Bitcoin’s block reward drops from 1 satoshi to 0 satoshi somewhere in the year 2140, completing a supply of approximately 21 million bitcoin [1]. What if that final drop never happened?</p>
<p>Bitcoin would continue emitting a 1 satoshi block reward in perpetuity, yielding an infinite supply. It would eventually reach 21 million + 1 bitcoin. Let’s see when that happens. We need to wait an additional 100 million blocks, which takes approximately 10⁸/6/24/365 = 1902 years.</p>
<p>Would that make make Bitcoin any less hard a currency? I don’t think anyone would be willing to argue that. So what makes a currency hard, if not a capped supply? I think the answer is obvious. Eventual negligible inflation is what makes for hard currency. It is what distinguishes cryptocurrencies from fiat.</p>
<p>How low should inflation be to be considered negligible? There are two related
measures for quantifying this. One is the yearly supply inflation rate,
commonly used for fiat currencies. The other is its inverse, stock-to-flow
ratio, the ratio between existing supply (stock) and the supply to be added in
the next year (flow). This measure is popular for commodities. As discussed
more extensively in [2],
bitcoin’s inflation rate fell below 2% in 2020, and will fall below that of
Gold in the next few years.</p>
<p>As long as the block reward (flow) never increases, then just by having supply increase every year, the inflation rate will naturally fall towards zero. The technical term for this is “disinflationary”. The inflation rate never reaches 0, but it gets arbitrarily close.</p>
<p>In cryptocurrency practice though, it doesn’t matter whether inflation has reached 0.1% or 0.01%. And that is because unlike Gold, it’s rather easy to lose cryptocurrency. Estimates for the fraction of all Bitcoin that is forever lost range from 15% to as much as 25%. People forget or lose passwords or take them to the grave. People accidentally erase wallet files. People accidentally send bitcoin to obsolete addresses, or straight into the void. People explicitly burn bitcoin. With newer coins, standards for recovery phrases, more mature software and hardware wallets, we can expect much lower loss percentages, but getting the yearly loss rate significantly below 1% will remain a challenge.</p>
<p>This motivates a definition of the “soft” end of emission as the time when the inflation rate drops below 1%, and the soft (total) supply as the supply at the time. At this time, new supply more or less balances the inevitable ongoing loss of coins. After its 4th halving in 2024, bitcoin will reach the soft end of its emission with a soft supply of (15/16) * 21M = 19.6875M BTC. At the extreme end of slow emissions is Grin. Its pure linear emission of 1 Grin per second forever, will reach its soft supply of a century (100*365*24*60*60s) of Grin, only in 2119.</p>
<p>This allows for a sensible supply comparison of nearly all coins. The only coins with uncapped soft supply are those with some explicit positive percentage growth rate, such as what fiat currencies aim for. I think there are only a handful of proof-of-stake coins that fall into that category. For all others, it gives a sensible measure of how far along a coin is in its emission. Using soft supply, a daily dollar issuance ranking like [3] would no longer need to show infinities for remaining supply of coins with a tail emission.</p>
<p>[1] <a href="https://medium.com/amberdata/why-the-bitcoin-supply-will-never-reach-21-million-7263e322de1">https://medium.com/amberdata/why-the-bitcoin-supply-will-never-reach-21-million-7263e322de1</a></p>
<p>[2] <a href="https://medium.com/the-capital/stock-to-flow-ratio-why-bitcoins-value-increases-after-each-halving-d08c23d46a08">https://medium.com/the-capital/stock-to-flow-ratio-why-bitcoins-value-increases-after-each-halving-d08c23d46a08</a></p>
<p>[3] <a href="https://www.f2pool.com/coins">https://www.f2pool.com/coins</a></p>
Beyond the Hashcash Proof-of-Work (there’s more to mining than hashing)2015-09-07T00:00:00+00:00http://tromp.github.io/blog/2015/09/07/beyond-hashcash<p>Many people equate Proof of Work (PoW) with one particular instance of it. It’s not hard to understand why. The Hashcash PoW is used not only in Bitcoin but in the vast majority of altcoins as well.</p>
<p>In Hashcash, miners all compete to look for a so called `nonce’ which, if provided as input (together with other parts of a block header) to a hash function, yields an output that’s numerically small enough to claim the next block reward.</p>
<p>Where most crypto currencies differ is in the choice of hash function; the Hashcash flavor as it were. Besides Bitcoin’s `vanilla’ flavor of SHA256, there is Litecoin’s scrypt, Cryptonote’s CryptoNight, Darkcoin’s X11, and many more. Most alternative flavors have the explicitly stated goal of reducing the performance gap between custom and commodity hardware, either by use of memory, or by sheer complexity.</p>
<p>But miners are only part of the picture. Proofs of work must not only be found, but verified as well, by every single client, including smartphones and other devices with limited resources. In Hashcash, verification amounts to evaluating the hash function on the given nonce and comparing the output with the difficulty threshold. Which is exactly the same effort as a single proof attempt.</p>
<p>Thus, in order to keep verification cheap, hash functions in Hashcash must restrict their resource usage as well. That’s why scrypt is configured to use only 128KB of memory.</p>
<p>Non-Hashcash PoWs do not suffer this limitation; they are asymmetric, with verification much cheaper than proof attempt. The first such PoW is Primecoin, which finds chains of nearly doubled prime numbers. The most recent example is my Cuckoo Cycle PoW, which was presented at the BITCOIN’2015 workshop in January. The whitepaper can be found at github.com/tromp/cuckoo, which also hosts various implementations, as well as bounties for improving on them.</p>
<p>In Cuckoo Cycle, proofs take the form of a length 42 cycle (loop) in a large random graph defined by some nonce. Imagine two countries, each with a billion cities, and imagine picking a billion border crossing roads that connect a random city in one country to a random city in the other country (the PoW actually uses a cheaply computed hash function to map the nonce, road number, and country to a city). We are asked if there is cycle of 42 roads visiting 42 different cities. If someone hands you a nonce and 42 road numbers, it is indeed easy to verify, requiring negligible time and memory.</p>
<p>But finding such a cycle is no easy task. Note however, that a city that connects to one road only cannot be part of the solution, nor can that road. David Andersen pointed out that such dead-end roads can be repeatedly eliminated, using one bit of memory per road to remember if that road is useful, and two bits per city to count if there are zero, one, or multiple useful roads to that city.</p>
<p>This process of computing counts for cities, and marking roads that lead to a city with count one as not useful, is the essence of Cuckoo Cycle mining and accounts for about 98% of the effort. It results in billions of random global memory accesses for reading and writing the counters. Consequently, about 2/3 of the runtime is memory latency, making this a low-power algorithm that keeps computers running cool.</p>
<p>After a sufficient number of counting and marking rounds, so few useful roads remain that another algorithm, inspired by Cuckoo Hashing, can quickly identify cycles (re-using the memory for the no longer needed counters).</p>
<p>Cuckoo Cycle has some downsides as well. First of all, proofs are large and will roughly triple the size of block headers. Secondly, it is very slow, taking the better part of a minute on a high end CPU (or GPU, which offer roughly the same speed) to look for a cycle among a billion roads.</p>
<p>In order to give slower CPUs a (somewhat) fair chance to win, the block interval should be much longer than a single proof attempt, so the amount of memory Cuckoo Cycle can use is constrained by the choice of block interval length.</p>
<p>These seem like reasonable compromises for an instantly verifiable memory bound PoW that is unique in being dominated by latency rather than computation. In that sense, mining Cuckoo Cycle is a form of ASIC mining where DRAM chips serve the application of randomly reading and writing billions of bits.</p>
<p>When even phones charging overnight can mine without orders of magnitude loss in efficiency, not with a mindset of profitability but of playing the lottery, the mining hardware landscape will see vast expansion, benefiting adoption as well as decentralization.</p>
Useless Darts Trivia2014-12-22T00:00:00+00:00http://tromp.github.io/blog/2014/12/22/darts-trivia<h2 id="which-scores-are-finishes">Which scores are finishes?</h2>
<p>One can throw 36, 40, or 50 plus any triple up to 120, and these numbers are 0, 1, and 2 modulo 3 respectively. In a picture:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> ... 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170
triple+36 * * *
triple+40 * * * *
triple+50 * * * * * * *
finishes: ... 150 151 152 153 154 155 156 157 158 160 161 164 167 170
</code></pre></div></div>
<h2 id="how-many-possible-9-darter-games-are-there">How many possible 9-darter games are there?</h2>
<p>This seems rather dificult to count until we start to distinguish cases based on final double and minimum of the first 8 darts. The 50,50 entry for instance is calculated as partitions of 501 - 50 - 8 * 50 = 51 into 0s, 1s, 4s, 7s, and 10s, with at least one 0:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>( 8 ) ( 8 ) ( 8 )
(2 1 0 0 5) + (2 0 1 1 4) + (2 0 0 3 3) = 168 + 840 + 560 = 1568.
\dbl 24 30 34 36 40 50
min\ --------------------------------
34 56
40 672
45 8
48 56
50 56 672 1568
51 8 224
54 56 448
57 8 56 56
</code></pre></div></div>
<p>Giving a grand total of 3944 possible 9-darters, as confirmed in this report, which also states the number of essentially different solutions (as a multi-set of first 8 dart scores) as 22.</p>
<p>Note that a finishing double of e.g. 38 is not possible, as there is no way to partition 501-8*60-38 = -17 into the numbers -3, -6, -9, -10, -12, and -15.</p>
<p>In practice you’ll only see 9-darters starting with two 180s, leaving a 141 finish. The number of ways to finish 141 is calculated similarly:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> \dbl 24 30 34 36 40 50
min\ --------------------------------
34 2
40 2
45 2
48 2
50 2 2
51 2 2
54 2
57 2
</code></pre></div></div>
<p>That’s 20 ways.</p>
<h2 id="what-is-the-most-impressive-9-darter">What is the most impressive 9-darter?</h2>
<p>That would be throwing three 167 “finishes” in a row, repeating triple 20, triple 19, and bulls eye. The bulls eye is harder to hit than any triple, and a 9-darter can include at most three of them. I will likely never witness that in my lifetime, but then who could have imagined all of these incredible rarities?</p>