Next: 8. Summary and Up: No Title Previous: 6. The Physics

7. Kilobyte Needles in Terabyte Haystacks

Generating data is only half the job in any simulation. The other half of the work of a computational theorist parallels that of an observer, and lies in the job of data reduction. As in the observational case, here too a good set of tools is essential. And not only that: unless the tools can be used in a flexible and coherent software environment, their usefulness will still be limited.

Three requirements are central in handling the data flow from a full-scale globular cluster simulation: modularity, flexibility, and compatibility. We have started to put together a software environment, Starlab (Hut et al. 1993), that incorporates these three requirements. To some extent, Starlab is modeled on NEMO, a stellar dynamics software environment developed six years ago at the Institute for Advanced Study, for a large part by Josh Barnes with input from Peter Teuben and me, and has subsequently been maintained and extended by Peter Teuben.

Starlab is different from NEMO mainly in the following areas: it emphasizes the use of UNIX pipes, rather than temporary files; its use of tree structures rather than arrays to represent -body systems; and its guarantee of data conservation - data which are not understood by a given module are simply passed on rather than filtered out.

Modularity: A Toolbox Approach

We have followed the UNIX model of combining a large number of small and relatively simple tools through pipes. This allows a quick and compact way of running small test simulations. For example, a study of relaxation effects in a cold collapse could be done as follows:

mkplummer -n 100 | freeze | leapfrog -t 2 -d 0.02 -e 0.05 | lagrad

Here mkplummer creates initial conditions for a 100-body system, according to a Plummer model distribution. The resulting data are piped into the next module, freeze , which simply sets all velocities to zero, while preserving the positions. Following that, the data are read in by the leapfrog integrator, which is asked to evolve the system for a period of 2 time units, with a stepsize of 0.02 time units, and a softening length of 0.05 length units. Finally, the resulting data are piped into a module that makes a plot of the Lagrangian radii of various percentiles of the system.

Flexibility: Structured Data Representation

Each snapshot of a -body simulation can be stored in a file in a standard format, with a header indicating the nature of the snapshot. In addition, a list of all the commands used to create the data is stored at the top of the file, together with the time at which the commands were issued, so as to minimize the uncertainty about the exact procedures used. Each individual body is presented as a node in a tree, constructed so as to reflect the presence of closely interacting subsystems and their internal structure.

Each body has several unstructured `scratch pads', in which each application program can write diagnostics or other comments describing particular occurrences during the integration. This has proved to be extremely useful, by allowing various forms of data reduction to take place already during the run. Especially during complicated interactions involving stellar dynamics, stellar hydrodynamics, and stellar evolution effects, a free-format reporting system, tied to the individual interacting objects, will be very helpful in allowing a reconstruction of episodes of greatest interest.

Compatibility: Unfiltered Piping

The internal data representation of each module is such that unrecognized quantities or comments are stored internally, in the form of character strings. They are reproduced at the time of output, at the correct position, preserving their correspondence with the initial bodies they were associated with (some of which may have collided and merged). This allows the use of an arbitrary combination of pipes with the guarantee that no data or comments will be lost.

For example, in the commands

evolve | mark_core| HR_plot| evolve

the first module evolves the system, integrating the equations of motion, while also following the way the individual stars age and interact hydrodynamically. The second program computes the location and size of the core of the star cluster, and marks those particles that are within one core radius from the center. The third module plots a Hertzsprung-Russell diagram of the star cluster (perhaps using special symbols for the core stars), before passing on the data once more to the module that evolves the whole system. For this to work, the mark_core program needs to preserve the stellar evolution information, even though it only `knows' about the stellar dynamical part of the data. Similarly, the HR_plot program needs to preserve the dynamical data.

Next: 8. Summary and Up: No Title Previous: 6. The Physics

Thu Feb 24 00:52:57 EST 1994