MACHINE | CPU | MHZ | COMPILER/JVM | KPOS/S |
Apple iMac (late 2013) | Intel Core i5 | 3200 | clang -O3 | 12123 |
PC running Linux | Intel Xeon E5-2687W | 3660 | cc -O3 -march=native | 12032 |
PC running Ubuntu 9.10 | Intel Core i7 975 | 3333 | gcc-4.4.1 -O3 -march=native -m64 | 10741 |
PC running Ubuntu 9.10 | Intel Core i7 920 | 2667 | gcc-4.4.1 -O3 | 9069 |
MacBook Air 13.3 running OS X Lion | Intel Core i5 | 1700 | cc -O3 | 8089 |
PC running RedHat Linux | Xeon 5160 | 3000 | gcc-4.1.2 -O3 | 7653 |
homemade workstation | overclocked AMD Opteron 250 | 2592 | gcc-4.1.2 -march=opteron -m64 -O3 | 6501 |
Supermicro PC | Intel Core2 duo 6600 | 2400 | gcc -m64 -O3 | 6444 |
PC running Ubuntu 9.10 | Intel Core i7 975 | 3333 | javac.1.6.0 -O; java -d64 -Xms1024m -Xmixed | 6310 |
ASUS A8N5X Windows XP/64 | AMD Athlon64 4800+ | 2400 | Visual C 8.0 /O2... | 6141 |
MacBook Pro | Intel Core 2 Duo T7500 | 2200 | gcc-4.3.0 -O3 -m64 ... | 6077 |
PC running Debian | AMD Athlon 64 3400+ | 2400 | gcc-4.0.3-1 -O3 -fprofile-use | 5881 |
PC | AMD Opteron 144 | 1804 | gcc -O3 -m64 | 4599 |
Dual CPU server | AMD Opteron 242 | 1593 | gcc-3.4.2 -O3 -m64 | 4395 |
Itanium server | Intel Itanium2 18MB cache | 1600 | icc | 3632 |
Apple Powermac | G5 | 2000 | gcc-4.0.0 -O3 -fast | 3290 |
PC running Debian | AMD Athlon 64 3400+ | 2400 | javac -O3 / jre1.6.0 | 3177 |
ASUS A8N5X Windows XP/32 | AMD Athlon64 4800+ | 2400 | Visual C 8.0 /O2... | 3141 |
PC | AMD Athlon XP 2700+ | 2171 | gcc-3.3.3 -O3 | 2337 |
Dual CPU server | AMD Opteron 242 | 1593 | jdk1.5.0-amd64 -O | 2086 |
Apple PowerMac | G4 7455 | 1467 | gcc 4.0 -O3 -fast - mcpu=7450 | 2043 |
PC | AMD Athlon XP 2700+ | 2171 | jdk1.5.0/bin/javac -O | 1284 |
Raspberry Pi Model B | ARMv6 16/128kb | 700 | gcc -O3 | 465 |
PC | AMD Athlon XP 2700+ | 2171 | ghc-6.4 -O | 106 |
This integer benchmark solves positions in the game of connect-4, as played on a vertical 7x6 board.
By default, it uses a 64Mb transposition table with the twobig replacement strategy. Positions are represented as 64-bit bitboards, and the hash function is computed using a single 64-bit modulo operation, giving 64-bit machines a slight edge. The alpha-beta searcher sorts moves dynamically based on the history heuristic. A move causing a cutoff is rewarded as many points as moves previously tried, each of which gets a -1 penalty, thus preserving total weight and avoiding renormalization (uniform penalties were found to work much better than depth dependent ones).
The default input file features 4 positions of increasing complexity, the last one being the starting position. Completing this benchmark amounts to solving the game of Connect-4! This takes about 10 minutes on contemporary PCs.
Gprof shows the following distribution of operations:
% cumulative self self total time seconds seconds calls s/call s/call name 28.15 13.66 13.66 2 6.83 21.39 ab 25.47 26.03 12.37 47489097 0.00 0.00 haswon 14.40 33.02 6.99 5596155 0.00 0.00 transpose 6.49 36.17 3.15 88085127 0.00 0.00 islegal 6.19 39.17 3.00 30986763 0.00 0.00 islegalhaswon 4.10 41.16 1.99 2 0.99 0.99 emptyTT 3.84 43.03 1.86 2 0.93 0.93 htstat 3.58 44.77 1.74 __umoddi3 2.43 45.95 1.18 8768342 0.00 0.00 makemove 2.43 47.13 1.18 8768326 0.00 0.00 backmove 1.12 47.67 0.55 3875486 0.00 0.00 transtore 1.07 48.20 0.52 5596155 0.00 0.00 hash 0.39 48.38 0.19 5596155 0.00 0.00 positioncode
An example benchmark run looks like
(myprompt) gcc -O SearchGame.c -o SearchGame (myprompt) ./SearchGame < inputs Fhourstones 3.1 (C) Boardsize = 7x6 Using 8306069 transposition table entries. Solving 8-ply position after 45461667 . . . score = 5 (+) work = 14 51596 pos / 23 msec = 2243.3 Kpos/sec - 0.281 < 0.000 = 0.001 > 0.001 + 0.716 Solving 8-ply position after 35333571 . . . score = 1 (-) work = 21 8716732 pos / 3863 msec = 2256.5 Kpos/sec - 0.271 < 0.036 = 0.020 > 0.089 + 0.584 Solving 8-ply position after 13333111 . . . score = 3 (=) work = 26 169704432 pos / 72417 msec = 2343.4 Kpos/sec - 0.216 < 0.144 = 0.021 > 0.242 + 0.377 Solving 0-ply position after . . . score = 5 (+) work = 29 1479113766 pos / 632956 msec = 2336.8 Kpos/sec - 0.249 < 0.125 = 0.027 > 0.191 + 0.408Performance is expressed in the intuitively appealing and meaningful measure of thousands of positions per second. The above machine achieved almost 2337 fhourstones, where a fhourstone is taken as a thousand positions searched per second.
The benchmark is currently available in C, Java, and haskell:
The C version can be compiled and run with
(myprompt) gcc -O3 SearchGame.c (myprompt) ./SearchGame < inputsThe Java version can be compiled and run with
(myprompt) javac -O SearchGame.java (myprompt) java -Xmx70m SearchGame < inputsThe Haskell version can be compiled and run with
(myprompt) ghc -O --make Main.hs (myprompt) ./a.out < inputsNote that Java measures real-time, for lack of a cpu-time measure. On my machine, the Java version is 75% slower than the C one, which I find hard to explain. Please let me know if you find any bugs, have any comments, or suggestions for improvements. Also please email any bench results to me mentioning machine, cpu, clockspeed, cache size, and compiler options. I'm mostly interested in new records and cpu's not yet listed.
Happy benching!