The gravsim codeE.1 is better suited for a multiprocessor machine. Its user interface and database format are however different from NEMO's and interface scripts can be defined which make working with this code a little easier. Both these C versions of the treecode (hackcode1 and gravsim) are inherently slower because they are recursive and spend most of their CPU time in treewalking (with a lot of integer arithmetic). The modified (vectorized) Hernquist fortran code (referred to as TREECODE V3) has an approximate speedup of about a factor 200-400 over the original VAX/Sun-3 speed on a CRAY supercomputer.
It is also perhaps interesting to quote that replacing the sqrt function by a very fast machine dependant one will increase the speed of the C version of the treecode by about 20%. Some recent HP computers have a special hardware floating point operation to perform 1/sqrt().