The Fhourstones Benchmark

The Fhourstones Benchmark (version 3.1)

MACHINE	CPU	MHZ	COMPILER/JVM	KPOS/S
Apple iMac (late 2013)	Intel Core i5	3200	clang -O3	12123
PC running Linux	Intel Xeon E5-2687W	3660	cc -O3 -march=native	12032
PC running Ubuntu 9.10	Intel Core i7 975	3333	gcc-4.4.1 -O3 -march=native -m64	10741
PC running Ubuntu 9.10	Intel Core i7 920	2667	gcc-4.4.1 -O3	9069
MacBook Air 13.3 running OS X Lion	Intel Core i5	1700	cc -O3	8089
PC running RedHat Linux	Xeon 5160	3000	gcc-4.1.2 -O3	7653
homemade workstation	overclocked AMD Opteron 250	2592	gcc-4.1.2 -march=opteron -m64 -O3	6501
Supermicro PC	Intel Core2 duo 6600	2400	gcc -m64 -O3	6444
PC running Ubuntu 9.10	Intel Core i7 975	3333	javac.1.6.0 -O; java -d64 -Xms1024m -Xmixed	6310
ASUS A8N5X Windows XP/64	AMD Athlon64 4800+	2400	Visual C 8.0 /O2...	6141
MacBook Pro	Intel Core 2 Duo T7500	2200	gcc-4.3.0 -O3 -m64 ...	6077
PC running Debian	AMD Athlon 64 3400+	2400	gcc-4.0.3-1 -O3 -fprofile-use	5881
PC	AMD Opteron 144	1804	gcc -O3 -m64	4599
Dual CPU server	AMD Opteron 242	1593	gcc-3.4.2 -O3 -m64	4395
Itanium server	Intel Itanium2 18MB cache	1600	icc	3632
Apple Powermac	G5	2000	gcc-4.0.0 -O3 -fast	3290
PC running Debian	AMD Athlon 64 3400+	2400	javac -O3 / jre1.6.0	3177
ASUS A8N5X Windows XP/32	AMD Athlon64 4800+	2400	Visual C 8.0 /O2...	3141
PC	AMD Athlon XP 2700+	2171	gcc-3.3.3 -O3	2337
Dual CPU server	AMD Opteron 242	1593	jdk1.5.0-amd64 -O	2086
Apple PowerMac	G4 7455	1467	gcc 4.0 -O3 -fast - mcpu=7450	2043
PC	AMD Athlon XP 2700+	2171	jdk1.5.0/bin/javac -O	1284
Raspberry Pi Model B	ARMv6 16/128kb	700	gcc -O3	465
PC	AMD Athlon XP 2700+	2171	ghc-6.4 -O	106

This integer benchmark solves positions in the game of connect-4, as played on a vertical 7x6 board.

By default, it uses a 64Mb transposition table with the twobig replacement strategy. Positions are represented as 64-bit bitboards, and the hash function is computed using a single 64-bit modulo operation, giving 64-bit machines a slight edge. The alpha-beta searcher sorts moves dynamically based on the history heuristic. A move causing a cutoff is rewarded as many points as moves previously tried, each of which gets a -1 penalty, thus preserving total weight and avoiding renormalization (uniform penalties were found to work much better than depth dependent ones).

The default input file features 4 positions of increasing complexity, the last one being the starting position. Completing this benchmark amounts to solving the game of Connect-4! This takes about 10 minutes on contemporary PCs.

Gprof shows the following distribution of operations:

  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 28.15     13.66    13.66        2     6.83    21.39  ab
 25.47     26.03    12.37 47489097     0.00     0.00  haswon
 14.40     33.02     6.99  5596155     0.00     0.00  transpose
  6.49     36.17     3.15 88085127     0.00     0.00  islegal
  6.19     39.17     3.00 30986763     0.00     0.00  islegalhaswon
  4.10     41.16     1.99        2     0.99     0.99  emptyTT
  3.84     43.03     1.86        2     0.93     0.93  htstat
  3.58     44.77     1.74                             __umoddi3
  2.43     45.95     1.18  8768342     0.00     0.00  makemove
  2.43     47.13     1.18  8768326     0.00     0.00  backmove
  1.12     47.67     0.55  3875486     0.00     0.00  transtore
  1.07     48.20     0.52  5596155     0.00     0.00  hash
  0.39     48.38     0.19  5596155     0.00     0.00  positioncode

An example benchmark run looks like

(myprompt) gcc -O SearchGame.c -o SearchGame
(myprompt) ./SearchGame  < inputs
Fhourstones 3.1 (C)
Boardsize = 7x6
Using 8306069 transposition table entries.
 
Solving 8-ply position after 45461667 . . .
score = 5 (+)  work = 14
51596 pos / 23 msec = 2243.3 Kpos/sec
- 0.281  < 0.000  = 0.001  > 0.001  + 0.716
 
Solving 8-ply position after 35333571 . . .
score = 1 (-)  work = 21
8716732 pos / 3863 msec = 2256.5 Kpos/sec
- 0.271  < 0.036  = 0.020  > 0.089  + 0.584
 
Solving 8-ply position after 13333111 . . .
score = 3 (=)  work = 26
169704432 pos / 72417 msec = 2343.4 Kpos/sec
- 0.216  < 0.144  = 0.021  > 0.242  + 0.377
 
Solving 0-ply position after  . . .
score = 5 (+)  work = 29
1479113766 pos / 632956 msec = 2336.8 Kpos/sec
- 0.249  < 0.125  = 0.027  > 0.191  + 0.408

Performance is expressed in the intuitively appealing and meaningful measure of thousands of positions per second. The above machine achieved almost 2337 fhourstones, where a fhourstone is taken as a thousand positions searched per second.

The benchmark is currently available in C, Java, and haskell:

Fhourstones.tar.gz (11396 bytes)

and is also available on github. The latter, courtesy of Toby Thain, provides a Makefile and support for Windows systems,

The C version can be compiled and run with

(myprompt) gcc -O3 SearchGame.c
(myprompt) ./SearchGame < inputs

The Java version can be compiled and run with

(myprompt) javac -O SearchGame.java
(myprompt) java -Xmx70m SearchGame < inputs

The Haskell version can be compiled and run with

(myprompt) ghc -O --make Main.hs
(myprompt) ./a.out < inputs

Note that Java measures real-time, for lack of a cpu-time measure. On my machine, the Java version is 75% slower than the C one, which I find hard to explain. Please let me know if you find any bugs, have any comments, or suggestions for improvements. Also please email any bench results to me mentioning machine, cpu, clockspeed, cache size, and compiler options. I'm mostly interested in new records and cpu's not yet listed.

Happy benching!

Back to my home page.
john.tromp@gmail.com