next up previous contents index
Next: E.1.2 Nbody0 Up: E.1 N-body integration Previous: E.1 N-body integration   Contents   Index

E.1.1 Treecode

The standard NEMO benchmark of the treecode integration is to run hackcode1 without any parameters. It will generate a spherical stellar system in virial equilibrium with 128 particles, and integrate it for 64 timesteps (tol=1 eps=0.05). In Table [*] the amount of CPU (in seconds) needed for one timestep is listed in column 2. When not otherwise mentioned, the code used is the standard NEMO hackcode1 with default compilation on the machine quoted. Note that one can often obtain significant performance increase (factor of 2 on some sparc architectures) by studying the native compiler and in particular its optimization options.


Table: Treecode Benchmarks



\begin{tabular}{\vert l\vert r\vert l\vert l\vert} \\ \hline
Machine & cpu-sec/s...
...line
386SX (16Mhz) & 87.000 & & (linux) software floating point\\
\end{tabular}


The gravsim codeE.1 is better suited for a multiprocessor machine. Its user interface and database format are however different from NEMO's and interface scripts can be defined which make working with this code a little easier. Both these C versions of the treecode (hackcode1 and gravsim) are inherently slower because they are recursive and spend most of their CPU time in treewalking (with a lot of integer arithmetic). The modified (vectorized) Hernquist fortran code (referred to as TREECODE V3) has an approximate speedup of about a factor 200-400 over the original VAX/Sun-3 speed on a CRAY supercomputer.

It is also perhaps interesting to quote that replacing the sqrt function by a very fast machine dependant one will increase the speed of the C version of the treecode by about 20%. Some recent HP computers have a special hardware floating point operation to perform 1/sqrt().


next up previous contents index
Next: E.1.2 Nbody0 Up: E.1 N-body integration Previous: E.1 N-body integration   Contents   Index
(c) Peter Teuben