/usr/bin/time $NEMO/src/scripts/nemo.bench make -f Benchfile clean all nemobench COMMAND
- The command make bench from the $NEMO directory runs a standard benchmark of about a dozen selected NEMO programs. It should run in about 30 seconds on a ~2020 computer. A short table (see below in RESULTS) lists results from some recent computers.
- Various Benchfile filles scattered through the $NEMO source tree contain benchmarks. There is no wrapper like "make check" has for Testfile’s
- Selected man pages have some examples. Over time these should be moved into the Benchfile in the source directory where this program is maintained. Some are mentioned below in this man page.
Intel Core i5-1135G7 15.4 XPS13 1460 / AMD EPYC 7302 @ 3.0GHz 23.9 lma 1000/ 28000 Intel Core i9-9900X 25.8 geminid 1214 / 9467 i5-10210U CPU @ 1.60GHz 29.4 X1Y4 1029 / 3194 Xeon(R) Silver 4114 @ 2.20GHz 37.1 terra 777 / 10213 i7-8550U CPU @ 1.80GHz 40.6 T480 (1600) 696 / 1935 Xeon E-2186G 3.80GHz 52.6 aurora 1313 / 6721 i7-3820 CPU @ 3.60GHz ??/67.5 dante 833 / 3693 Xeon(R) CPU E3-1280 @ 3.50GHz ??/70.3 chara 828 / 3117 i7-3630QM CPU @ 2.40GHz 76.2 T530 (1200) 762 / 3006 Xeon(R) E5-2623 v4 @ 2.60GHz 42.1/79.4 kraken 782 / 5800 Xeon(R) X5550 @ 2.67GHz 115.4 sdp 566 / 3938
Keep in mind for most of these a default compilation was used. Some benchmarks are known to be able to improved by up to a factor of two with selected compiler and options changes.
The following tasks are run in the standard
NEMO bench, for details see the src/scripts/nemo.bench script.
directcode nbody=3072 out=d0 seed=123 hackcode1 nbody=10240 out=h0 seed=123 mkplummer p0 10240 seed=123 gyrfalcON p0 p1 kmax=6 tstop=2 eps=0.05 potcode p0 p2 freqout=10 freq=1000 tstop=2 potname=plummer mkspiral s0 $nbody3 nmodel=40 seed=123 ccdmath "" c0 ’ranu(0,1)’ size=256 seed=123 ccdpot c0 c1 mkorbit o0 x=1 e=1 lz=1 potname=log orbint o0 o1 nsteps=10000000 dt=0.001 nsave=100000In addition each data file that is produced is checksummed and compared to a baseline version using bsf(1NEMO) if the argument bsf=1 is added.
i9-12900H - n/a 1851 M1-Max - n/a 1785 i7-12800H - n/a 1654 i9-11980HK - n/a 1616 M1 (Macbook Air) 1269 n/a 1750 / 7xxxx i7-11800H - n/a 1474 Intel Core i5-1135G7 1000 XPS13 1460 / AMD threadripper 3960x 896 AlexBox AMD EPYC 7302 @ 3.0GHz 630 lma 1000/ 28000 Intel Core i9-9900X - geminid 1214 / 9467 Xeon Gold 5218 @ 2.30GHz 761 unity/node99 1100 / 8610 i5-10210U CPU @ 1.60GHz 732 X1Y4 1029 / 3194 Xeon(R) Silver 4114 @ 2.20GHz - terra 777 / 10213 i7-8550U CPU @ 1.80GHz 548 T480 (1600) 696 / 1935 Xeon E-2186G 3.80GHz - aurora 1313 / 6721 i7-3820QM @ 2.70 GHz 498 MacBookPro (Retina, Mid 2012) i7-3820 CPU @ 3.60GHz 414 dante 833 / 3693 Xeon(R) CPU E3-1280 @ 3.50GHz - chara 828 / 3117 i7-3630QM CPU @ 2.40GHz 429 T530 (1200) 762 / 3006 Xeon E5-2623 v4 @ 2.60GHz - kraken 782 / 5800 Xeon X5550 @ 2.67GHz 256 sdp 566 / 3938
/usr/bin/time hackcode1 tstop=2 > /dev/null
On a Sun 3/50 (our development machine) this took about 5 seconds per step. Now, nearly 35 years later, my laptop runs this about 50,000 times faster. Looking in more detail at the original NEMO manual:
cpu/steps sun 3/60: 20 MHz 2.28 i5-1135G7: 4200 MHz 0.0000875Despite that the cpu was 210 times faster, the code ran 26,000 faster. A very impressive factor of 120 improvement in chip and possibly some compiler technology. The NEMO users guide has an appendix in which this benchmark is listed for a variety of computers since 1986.
% time ls 0.012u 0.068s 0:00.77 9.0% 0+0k 8376+0io 0pf+0w 2.324u 1.080s 0:09.25 36.7% 0+0k 1049384+2097160io 2pf+0w 1.876u 0.788s 0:03.63 73.0% 0+0k 0+2097160io 0pf+0wOn linux the command
echo 1 > > /proc/sys/vm/drop_cacheswill clear the disk cache in memory, so your program will be forced to read from disk, with all possible interference from other programs
another useful addition to the benchmark is that the output can be turned
off easily, by using out=., viz.
% sudo $NEMO/src/scripts/clearcache % time ccdsmooth n1 . dir=x 0.852u 1.068s 0:12.41 15.3% 0+0k 2098312+0io 6pf+0w 0.812u 0.400s 0:01.21 100.0% 0+0k 0+0io 0pf+0w 0.820u 0.380s 0:01.20 100.0% 0+0k 0+0io 0pf+0wwhere the last two instances were just re-running the same command, but now clearly showing the effect of reading the file from memory instead of disk. By repeating this whole series a few times, an lower bound to the wall clock time is more likely to properly account for the I/O overhead time.
Rule of thumb: always run a benchmark a few times to see if a hot CPU slows down the benchmark. If I/O is cached. Other tasks are interfering.
Other industry benchmarks:
Geekbench 5 (very wide variety of compute workloads - baseline is i3-8100) Linpack (focus on floating point operations - Gflops) SPEC CPU 2017 ($$$) benchmark -
/usr/bin/time tabgen tab3 100000000 3 /usr/bin/time tabbench2 . mode=-1 this bench will need to be repeated for mode=0,1,2,3 to estimate the different components as they are added to the workflow. The tabgen(1NEMO) is dominated by drawing random numbers and writing them using printf(3) , which is slow. 80s writing, using tabgen 2s reading in tabbench2 22s parsing in numbers [np.loadtxt takes 748 sec!!!] 6s using fie(3NEMO) to compute radii 1s using np.sqrt(), and presumably C’s sqrt() as well
nbody=10000000 /usr/bin/time mkplummer . $nbody 2.80user 0.45system 0:03.26elapsed 99%CPU echo mkplummer . $nbody > run.txt echo mkplummer . $nbody >> run.txt /usr/bin/time parallel --jobs 1 < run.txt 5.89user 0.83system 0:06.71elapsed 100%CPU /usr/bin/time parallel --jobs 2 < run.txt 6.00user 0.79system 0:03.44elapsed 197%CPU
which follows Amdahl’s law close to 100%!
$NEMO/src/scripts/nemo.bench Script uses by make bench/bench5/bench10 $NEMO/data standard repository area for (small) data files. Benchfile A Makefile that can orchestrate series of benchmarks /tmp/nemobench.log The nemobench keeps logfile
12-may-97 created PJT 26-nov-03 finally added some data PJT 17-feb-04 added bench0 comparison PJT 31-mar-05 added some cygwin numbers, fixed input PJT 6-may-11 added i7 and SHMEM/HDD comparison PJT 27-sep-13 added caveats PJT 6-jan-2018 updated for V4, more balanced benchmarks PJT 27-dec-2019 nemo.bench; updated with potcode and orbint PJT 26-jul-2020 added timings / added geekbench5 PJT
Table of Contents