Progress in increasing the number of particles in -body simulations of star clusters has been slow. In the sixties, was measured in the tens; in the seventies in the hundreds; and in the eighties in the thousands. Peanuts indeed, compared to cosmological simulations with and more particles! Why are these numbers so different?

The main reason is the fact that star clusters evolve on the time scale of the two-body relaxation time at the half-mass radius, . In terms of the half-mass crossing time , . In addition, the use of tree codes or other codes based on potential solvers is not practical, given the high accuracy required. This follows from the fact that an evolving star cluster is nearly always close to thermal equilibrium. Therefore, small errors in the orbits of individual particles can give rise to relatively much larger errors in the heat flux through the system.

As a result, the cost of a simulation of star cluster evolution scales roughly , since crossing times are needed for each relaxation time, and all gravitational interactions between the stars have to be evaluated many times per crossing time. In contrast, most simulations in galactic dynamics span a fixed number of crossing times, and can employ an algorithm with a computational cost scaling . It is this difference, of roughly a factor , which has kept cluster modeling stuck in such a modest range of -values.

*What it Takes*

The above rough estimates, although suggestive, are not very reliable. In an evolving star cluster, large density gradients will be formed, and sophisticated integration schemes can employ individual time steps and various regularization mechanisms to make the codes vastly more efficient (see the next section). One might have hoped that the above scaling estimates might have turned out to be too pessimistic. Indeed, empirical evidence based on runs in the range seemed to indicated a scaling of the computational cost , rather than , per crossing time (Aarseth 1985).

However, a detailed analysis of the scaling behavior of various integration schemes (Makino &Hut 1988) showed that the relatively more mild empirical scaling must have been an artifact of the still rather small values it was based on. Instead, the best asymptotic scaling reported by Makino &Hut, based on a lengthy analysis of two-timescale methods, resulted in a computational cost per crossing time in a homogeneous system, as well as in an isothermal density distribution. For steeper density distributions, with the velocity dispersion increasing towards the center, they found even slightly higher cost estimates.

With this staggering scaling of the computational cost, slightly worse than , it would seem worthwhile to look for some type of approximate method, in which part of the star cluster is modeled in a statistical fashion. Indeed, such a hybrid approach, using a combination of Fokker-Planck methods and direct -body integration, was carried out by McMillan and Lightman (1984ab) and McMillan (1985, 1986). This approach proved to be successful in modeling a star cluster around the moment of core-collapse. However, these simulations could not be extended significantly beyond core collapse, for lack of computer time.

When Hut, Makino &McMillan (1988) applied the analysis by Makino &Hut to McMillan's hybrid code, as well as to a variety of other algorithms, they reached a pessimistic conclusion. In order to model even a modest globular cluster with stars, even with the theoretically most efficient integration scheme, would carry a cost of several Teraflops-days (a Teraflops equals floating point operations per second, and a Teraflops-day therefore corresponds to floating point operations). More realistically, they concluded, for a relatively simple and vectorizable or parallelizable algorithm, the computational cost would lie in the range of Teraflops-months.

*How to Get There*

Available computer speed increases by a factor of nearly two each year, or a factor of per decade. Ten years ago, the speed of personal computers (or a per-person-averaged speed of a VAX), was measured in kflops, while current workstations are measured in terms of Mflops. Extrapolating, by the year 2013 we can expect each of us to have a multi-Teraflop machine on our desk, enabling us to core-collapse a globular cluster in a week or so.

If we look at the speed of the fastest supercomputers available, a similar scaling holds. Ten years ago, astrophysicists could have occasional access to a supercomputer, but when averaged to a sustained speed, it would boil down to only a fraction of a Mflops speed. And indeed, by now the best one can hope to obtain from a NSF supercomputer center is an average speed of a fraction of a Gflops. It would seem that the evolution of even a modest globular cluster would not be possible until some time around the year 2003.

Fortunately, we do not have to wait that long. Following the rather
pessimistic conclusions of Hut *et al.* (1988), a project was started to
construct special-purpose hardware for -body calculations, by a
small group of astrophysicists at Tokyo University (Sugimoto *et al.*
1990). This GRAPE project (from `GRAvity PipE') centers on the
development of parallel Newtonian-force-accelerators, in analogy to
the idea of using floating-point accelerators to speed up
workstations. During the last three years, a variety of GRAPE
versions has been completed (and some of them distributed to other
locations outside Japan), and used successfully for several
astrophysical calculations (see, e.g., Funato, Makino &Ebisuzaki
1992).

The next major step in this project will be the development of a Teraflops speed high-accuracy special-purpose computer (the HARP, for `Hermite AcceleratoR Pipeline'; Makino, Kokubo &Taiji 1993), in the form of a set of a few thousand specially designed chips, each with a speed of several hundred Mflops. This machine is expected to be available some time next year.

Thu Feb 24 00:52:57 EST 1994