3. Filestructure¶
Note
NEMO stores its persistent data in binary files, which under most circumstances
can also be used in a Unix pipe by using the -
symbol. The tsf
program
will show the contents of such files in more human readable form.
Here we give an overview of the file structure of NEMO’s persistent data stored on disk. The popular memory (object) models, and how they interact with persistent data on disk, are discussed in Programmers Guide. Most of the data handled by NEMO is in the form of a specially designed XML-like binary format (well before XML was conceived) although exceptions like ASCII files/tables will also be discussed. Ample examples illustrate creation, manipulation and data transfer. We also mention a few examples of function descriptors, a dataformat that make use of the native object file format of your operating system (a.out(5) and dlopen(3)) that are dynamically loaded during runtime.
3.1. Binary Structured Files¶
Note
There is also a program called bsf, which benchmarks a regression value of the floating point values in the file.
Most of the data files used by NEMO share a common low level binary file
structure, which can be viewed as a sequence of tagged data items. Special
symbols are defined to group these items hierarchically into sets. Data items
are typically scalar values or homogeneous arrays constructed from
elementary C data types, but the programmer can also add more complex
structures, such as C’s struct
structure definition, or any user
defined data structure. In this last case tagging by type is not
possible anymore, and support for a machine independent format
is not guaranteed. Using such constructs is not recommended if the
data needs to be portable accross platforms.
The hierarchical structure of a binary file in this general format can
be viewed in human-readable format at the terminal using a special
program, tsf
(”type structured file”).
Its counterpart, rsf
(”read structured file”),
converts such human-readable files (in that special ASCII Structured
File format (ASF) into binary structured files (BSF).
In principle it is hence
possible to transfer data files between different types of computers
using rsf
and tsf
(see examples in Section~ref{s:exch-data}).
Let us start with a small example: With the NEMO
program mkplummer
we first create an
N-body realization of a spherical Plummer model:
1% mkplummer i001.dat 1024
Note that we made use of the shortcut that out=
and nbody=
are the first two program keywords, and they
were assigned their value by position rather than by associated name.
We can now display the contents of the binary file i001.dat
with
tsf
:
2% tsf i001.dat
char Headline[33] "set_xrandom: seed used 706921861"
char History[36] "mkplummer i001.dat 1024 VERSION=2.5"
set SnapShot
set Parameters
int Nobj 01750
double Time 0.00000
tes
set Particles
int CoordSystem 0201402
double Mass[1024] 0.00195313 0.00195313 0.00195313
0.00195313 0.00195313 0.00195313 0.00195313 0.00195313
0.00195313 0.00195313 0.00195313 0.00195313 0.00195313
0.00195313 0.00195313 0.00195313 0.00195313 0.00195313
. . .
double PhaseSpace[1024][2][3] 4.92932 0.425103 -0.474249
0.342025 -0.112242 4.60796 -0.00388599 -0.389558 -0.958787
0.220561 0.213904 3.47561 0.0176012 1.22146 -0.903484
-0.705422 4.26963 -0.263561 1.04382 -0.199518 -0.480749
. . .
tes
tes
This is an example of a data-file from the N-body group, and consists of a single snapshot at time=0.0. This snapshot, with 1024 bodies with double precision masses and full 6 dimensional phase space coordinates, totals 57606 bytes, whereas a straight dump of only the essential information would have been 57344 bytes, a mere 0.5% overhead. The overhead will be larger with small amounts of data, e.g. diagnostics in an N-body simulation, or small N-body snapshots.
Besides some parameters in the Parameters
set, it consists
of a Particles
set, where (along the type of coordinate system)
all the masses and phase space coordinates of all particles
are defined. Note the convention of integers starting with
a 0
in octal representation. This is done for portability
reasons.
A comment about online help:
NEMO uses the Unix man(5) format
for more detailed online help,
although the inline help (system help=
keyword)
is most of the times sufficient enough
to remind a novice user of the keywords and their meaning.
The man
command is a last resort, if more detailed information
and examples are needed.
3% man tsf
Note that, since the online manual page is a different file from the
source code, information in the manual page can easily get outdated, and
the inline (help=
) help, although very brief,
is more likely to be up to date since it is generated from the source
code (executable) itself:
4% tsf help=h
in : input file name [???]
maxprec : print nums with max precision [false]
maxline : max lines per item [4]
allline : print all lines (overrides maxline) [false]
indent : indentation of compound items [2]
margin : righthand margin [72]
item : Select specific item []
xml : output data in XML format? (experimental) [f]
octal : Force integer output in octal again? [f]
VERSION : 29-aug-02 PJT [3.1]
3.2. Pipes¶
In the UNIX operating system pipes can be very effectively used to pass information from one process to another. One of the well known textbook examples is how one gets a list of misspelled (or unknown) words from a document:
% spell file | sort | uniq | more
NEMO programs can also pass data via UNIX pipes, although with a
slightly different syntax: a dataset that is going to be part of a pipe
(either input or output) has to be designated with the -
(dash) symbol for their filename.
Also, and this is very important, the receiving task
at the other end of the pipe should get data from only one source.
If the task at the sending end of the pipe wants to send binary data over
that pipe, but in addition the same task would also write normal
standard
output, the pipe would be corrupted with two incompatible sources of
data. An example of this is the program
snapcenter
. The keyword report
must be set to
false
instead, which is actually the default now.
So, for example, the output of a previous N-body
integration is re-centered on it’s center of mass, and subsequently
rectified and stacked into a single image as follows:
% snapcenter r001.dat . report=t | tabplot - 0 1,2,3
% snapcenter r001.dat - report=f |\
snaprect - - 'weight=-phi*phi*phi' |\
snapgrid - r001.sum stack=t
If the keyword report=f
would not have been set properly,
snaprect
would not have been able to process it’s convoluted
input. Some other examples
are discussed in Section~ref{ss:data}.
3.3. History of Data Reduction¶
Most programs
in NEMO will automatically keep track of the history of
their data-files in a self-describing and self-documenting
way. If a program modifies an input file and produces an
output file, it will prepend the
command-line with which it was invoked to its data history. The
data history is normally located at the beginning of a data file.
Comments entered using the frequently used program keyword
headline=
will also appear in the history section of your data file.
A utility, hisf
can be used to display the history of a data-file.
This utility can also be used to create a pure history file (without any
data) by using the optional out=
and text=
keywords. Of
course tsf
could also be used by scanning its output for the string
History
or Headline
:
5% tsf r001.dat | grep History
which shows that tsf
, together with it’s counterpart rsf
has
virtually the same functionality as hisf
.
3.4. Table format¶
Many programs are capable of producing standard output in (ASCII) tabular format. The output can be gathered into a file using standard UNIX I/O redirection. In the example
6% radprof r001.dat tab=true > r001.tab
the file r001.tab
will contain (amongst others) columns with
surface density and radius from the snapshot r001.dat
. These
(ASCII) table files can be used by various programs for further
display and analysis. NEMO also has a few programs for this purpose
available (e.g.} tabhist
for analysis and histogram
plotting, tablsqfit
for checking correlations between two columns and
tabmath
for general table handling.
The manual
pages of the relevant NEMO programs should inform you how to get nice
tabular output, but sometimes it is also necessary to write a shell/awk
script or parser to do the job.
A usefull (open source domain) program redir(1NEMO) has been included in NEMO
7% redir -e debug.out tsf r001.dat debug=2
would run the tsf
command, but redirecting the
stderr standard error output to a file stderr.out
. There are
ways in the C-shell to do the same thing, but they are
clumsy and hard to remember. In the bash
shell this is accomplished much easier:
7$ tsf r001.dat debug=2 2>debug.out
One last word of caution regarding tables: tables can also be used
very effectively in pipes, for example take the first example,
and pipe the output into tabplot
to get a quick look
at the profile:
8% snapprint r001.dat r | tabhist -
In older versions of NEMO the number of lines that tabhist could read from a pipe, but as of NEMO version 4.4 tables can be arbitrarely large in size.
3.5. Dynamically Loadable Functions¶
A very peculiar data file format encountered in NEMO is that of the
function descriptors. They present themselves to the user through
one or more keywords, and in reality point to a compiled
piece of code that will get loaded by NEMO (using loadobj(3NEMO)).
They normally live in $NEMOOBJ
.
We currently have 4 of these in NEMO:
3.5.1. Potential Descriptors¶
The potential descriptor is used in orbit
calculations and a few N-body programs. These are actually binary
object files (hence extremely system dependent!!), and
used by the dynamic object loader
during runtime. Potentials are
supplied to NEMO programs as an input variable (i.e. a set of
keywords, normally called potname=
, potpars=
and potfile=
.
For this, a mechanism is needed to dynamically load
the code which calculates the potential. This is done by a
dynamic object loader that comes with NEMO.
If a program needs a potential, and it is present in the
default repository ($POTPATH
or {$NEMOOBJ/potential
), it is
directly loaded into memory by this dynamic object loader.
If only a source file is present,
e.g. in the current directory, it is compiled on the fly
and then loaded. The source code can be written
in C or FORTRAN. Rules and more information
can be found in potential(3NEMO) and potential(5NEMO)
The program potlist(1NEMO)
can be used to test potential descriptors.
3.5.2. Bodytrans Functions¶
Another family of object files used by the dynamic
object loader are the bodytrans(5NEMO) functions. These were
actually the first one of this kind introduced in NEMO.
They are functions generated from expressions containing body-variables
(mass, position, potential, time, ordinal number etc.). They frequently occur
in programs where it is desirable to have an arbitrary
expression of body variables
e.g. plotting and printing programs, sorting program etc.
Expressions which are not in the standard repository (currently
$BTRPATH
or $NEMOOBJ/bodytrans
) will
be generated on the fly and saved for later use.
The program bodytrans(1NEMO) is available
to test and save new expressions. Examples are given in
Section~ref{s-dispanal}, a table of the
precompiled ones are in Table~ref{t:bodytrans}.
3.5.3. Nonlinear Least Squares Fitting Functions¶
The program tabnllsqfit(1NEMO) can fit (linear or non-linear, depending
on the parameters) a function to a set of datapoints from an ASCII table.
The keyword fit=
describes the model (e.g. a line, plane, gaussian, circle,
etc.), of which a few common ones have been pre-compiled with the program.
In that sense this is different from the previous two function descriptors,
which always get loaded from a directory with precompiled object files.
The keyword load=
can be used to feed a user defined function to
this program. The manual page has a lot more details.
3.5.4. Rotation Curves Fitting Functions¶
Very similar to the Nonlinear Least Squares Fitting Functions are the Rotation Curves Fitting Functions, except they are peculiar to the 1- and 2-dimensional rotation curves one find in galaxies as the result of a projected circular streaming model. The program rotcurshape(1NEMO) is the only program that uses these functions, the manual page has a lot more details.
3.6. FITS¶
FITS is one of the earliest data formats used in (observational) astronomy, and NEMO handles some of the conversions from and to FITS, as well as some FITS utilities.
For the SDFITS extension there is limited support in sdinfo and scanfits.