[pic of c4 board] The Fhourstones Benchmark (version 3.1)

MACHINE CPU MHZ COMPILER/JVM KPOS/S
Apple iMac (late 2013) Intel Core i5 3200 clang -O3 12123
PC running Linux Intel Xeon E5-2687W 3660 cc -O3 -march=native 12032
PC running Ubuntu 9.10 Intel Core i7 975 3333 gcc-4.4.1 -O3 -march=native -m64 10741
PC running Ubuntu 9.10 Intel Core i7 920 2667 gcc-4.4.1 -O3 9069
MacBook Air 13.3 running OS X Lion Intel Core i5 1700 cc -O3 8089
PC running RedHat Linux Xeon 5160 3000 gcc-4.1.2 -O3 7653
homemade workstation overclocked AMD Opteron 250 2592 gcc-4.1.2 -march=opteron -m64 -O3 6501
Supermicro PC Intel Core2 duo 6600 2400 gcc -m64 -O3 6444
PC running Ubuntu 9.10 Intel Core i7 975 3333 javac.1.6.0 -O; java -d64 -Xms1024m -Xmixed 6310
ASUS A8N5X Windows XP/64 AMD Athlon64 4800+ 2400 Visual C 8.0 /O2... 6141
MacBook Pro Intel Core 2 Duo T7500 2200 gcc-4.3.0 -O3 -m64 ... 6077
PC running Debian AMD Athlon 64 3400+ 2400 gcc-4.0.3-1 -O3 -fprofile-use 5881
PC AMD Opteron 144 1804 gcc -O3 -m64 4599
Dual CPU server AMD Opteron 242 1593 gcc-3.4.2 -O3 -m64 4395
Itanium server Intel Itanium2 18MB cache 1600 icc 3632
Apple Powermac G5 2000 gcc-4.0.0 -O3 -fast 3290
PC running Debian AMD Athlon 64 3400+ 2400 javac -O3 / jre1.6.0 3177
ASUS A8N5X Windows XP/32 AMD Athlon64 4800+ 2400 Visual C 8.0 /O2... 3141
PC AMD Athlon XP 2700+ 2171 gcc-3.3.3 -O3 2337
Dual CPU server AMD Opteron 242 1593 jdk1.5.0-amd64 -O 2086
Apple PowerMac G4 7455 1467 gcc 4.0 -O3 -fast - mcpu=7450 2043
PC AMD Athlon XP 2700+ 2171 jdk1.5.0/bin/javac -O 1284
Raspberry Pi Model B ARMv6 16/128kb 700 gcc -O3 465
PC AMD Athlon XP 2700+ 2171 ghc-6.4 -O 106

This integer benchmark solves positions in the game of connect-4, as played on a vertical 7x6 board.

By default, it uses a 64Mb transposition table with the twobig replacement strategy. Positions are represented as 64-bit bitboards, and the hash function is computed using a single 64-bit modulo operation, giving 64-bit machines a slight edge. The alpha-beta searcher sorts moves dynamically based on the history heuristic. A move causing a cutoff is rewarded as many points as moves previously tried, each of which gets a -1 penalty, thus preserving total weight and avoiding renormalization (uniform penalties were found to work much better than depth dependent ones).

The default input file features 4 positions of increasing complexity, the last one being the starting position. Completing this benchmark amounts to solving the game of Connect-4! This takes about 10 minutes on contemporary PCs.

Gprof shows the following distribution of operations:

  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 28.15     13.66    13.66        2     6.83    21.39  ab
 25.47     26.03    12.37 47489097     0.00     0.00  haswon
 14.40     33.02     6.99  5596155     0.00     0.00  transpose
  6.49     36.17     3.15 88085127     0.00     0.00  islegal
  6.19     39.17     3.00 30986763     0.00     0.00  islegalhaswon
  4.10     41.16     1.99        2     0.99     0.99  emptyTT
  3.84     43.03     1.86        2     0.93     0.93  htstat
  3.58     44.77     1.74                             __umoddi3
  2.43     45.95     1.18  8768342     0.00     0.00  makemove
  2.43     47.13     1.18  8768326     0.00     0.00  backmove
  1.12     47.67     0.55  3875486     0.00     0.00  transtore
  1.07     48.20     0.52  5596155     0.00     0.00  hash
  0.39     48.38     0.19  5596155     0.00     0.00  positioncode

An example benchmark run looks like

(myprompt) gcc -O SearchGame.c -o SearchGame
(myprompt) ./SearchGame  < inputs
Fhourstones 3.1 (C)
Boardsize = 7x6
Using 8306069 transposition table entries.
 
Solving 8-ply position after 45461667 . . .
score = 5 (+)  work = 14
51596 pos / 23 msec = 2243.3 Kpos/sec
- 0.281  < 0.000  = 0.001  > 0.001  + 0.716
 
Solving 8-ply position after 35333571 . . .
score = 1 (-)  work = 21
8716732 pos / 3863 msec = 2256.5 Kpos/sec
- 0.271  < 0.036  = 0.020  > 0.089  + 0.584
 
Solving 8-ply position after 13333111 . . .
score = 3 (=)  work = 26
169704432 pos / 72417 msec = 2343.4 Kpos/sec
- 0.216  < 0.144  = 0.021  > 0.242  + 0.377
 
Solving 0-ply position after  . . .
score = 5 (+)  work = 29
1479113766 pos / 632956 msec = 2336.8 Kpos/sec
- 0.249  < 0.125  = 0.027  > 0.191  + 0.408
Performance is expressed in the intuitively appealing and meaningful measure of thousands of positions per second. The above machine achieved almost 2337 fhourstones, where a fhourstone is taken as a thousand positions searched per second.

The benchmark is currently available in C, Java, and haskell:

and is also available on github. The latter, courtesy of Toby Thain, provides a Makefile and support for Windows systems,

The C version can be compiled and run with

(myprompt) gcc -O3 SearchGame.c
(myprompt) ./SearchGame < inputs
The Java version can be compiled and run with
(myprompt) javac -O SearchGame.java
(myprompt) java -Xmx70m SearchGame < inputs
The Haskell version can be compiled and run with
(myprompt) ghc -O --make Main.hs
(myprompt) ./a.out < inputs
Note that Java measures real-time, for lack of a cpu-time measure. On my machine, the Java version is 75% slower than the C one, which I find hard to explain. Please let me know if you find any bugs, have any comments, or suggestions for improvements. Also please email any bench results to me mentioning machine, cpu, clockspeed, cache size, and compiler options. I'm mostly interested in new records and cpu's not yet listed.

Happy benching!


Back to my home page.
john.tromp@gmail.com