.. _filestr: Filestructure ============= .. note:: NEMO stores its persistent data in binary files, which under most circumstances can also be used in a Unix pipe by using the ``-`` symbol. The ``tsf`` program will show the contents of such files in more human readable form. Here we give an overview of the file structure of NEMO's persistent data stored on disk. The popular memory (object) models, and how they interact with persistent data on disk, are discussed in :ref:`progr`. Most of the data handled by NEMO is in the form of a specially designed XML-like binary format (well before XML was conceived) although exceptions like ASCII files/tables will also be discussed. Ample examples illustrate creation, manipulation and data transfer. We also mention a few examples of function descriptors, a dataformat that make use of the native object file format of your operating system (*a.out(5)* and *dlopen(3)*) that are dynamically loaded during runtime. Binary Structured Files ----------------------- .. note:: There is also a program called **bsf**, which benchmarks a regression value of the floating point values in the file. Most of the data files used by NEMO share a common low level binary file structure, which can be viewed as a sequence of tagged data items. Special symbols are defined to group these items hierarchically into sets. Data items are typically scalar values or homogeneous arrays constructed from elementary C data types, but the programmer can also add more complex structures, such as C's ``struct`` structure definition, or any user defined data structure. In this last case tagging by type is not possible anymore, and support for a machine independent format is not guaranteed. Using such constructs is not recommended if the data needs to be portable accross platforms. The hierarchical structure of a binary file in this general format can be viewed in human-readable format at the terminal using a special program, ``tsf`` ("*type structured file*"). Its counterpart, ``rsf`` ("*read structured file*"), converts such human-readable files (in that special ASCII Structured File format (ASF) into binary structured files (BSF). In principle it is hence possible to transfer data files between different types of computers using ``rsf`` and ``tsf`` (see examples in Section~\ref{s:exch-data}). Let us start with a small example: With the NEMO program ``mkplummer`` we first create an N-body realization of a spherical Plummer model: .. code-block:: 1% mkplummer i001.dat 1024 Note that we made use of the shortcut that ``out=`` and ``nbody=`` are the first two *program keywords*, and they were assigned their value by position rather than by associated name. We can now display the contents of the binary file ``i001.dat`` with ``tsf``: .. code-block:: 2% tsf i001.dat char Headline[33] "set_xrandom: seed used 706921861" char History[36] "mkplummer i001.dat 1024 VERSION=2.5" set SnapShot set Parameters int Nobj 01750 double Time 0.00000 tes set Particles int CoordSystem 0201402 double Mass[1024] 0.00195313 0.00195313 0.00195313 0.00195313 0.00195313 0.00195313 0.00195313 0.00195313 0.00195313 0.00195313 0.00195313 0.00195313 0.00195313 0.00195313 0.00195313 0.00195313 0.00195313 0.00195313 . . . double PhaseSpace[1024][2][3] 4.92932 0.425103 -0.474249 0.342025 -0.112242 4.60796 -0.00388599 -0.389558 -0.958787 0.220561 0.213904 3.47561 0.0176012 1.22146 -0.903484 -0.705422 4.26963 -0.263561 1.04382 -0.199518 -0.480749 . . . tes tes This is an example of a data-file from the N-body group, and consists of a single *snapshot* at time=0.0. This snapshot, with 1024 bodies with double precision masses and full 6 dimensional phase space coordinates, totals 57606 bytes, whereas a straight dump of only the essential information would have been 57344 bytes, a mere 0.5% overhead. The overhead will be larger with small amounts of data, *e.g.* diagnostics in an N-body simulation, or small N-body snapshots. Besides some parameters in the ``Parameters`` set, it consists of a ``Particles`` set, where (along the type of coordinate system) all the masses and phase space coordinates of all particles are defined. Note the convention of integers starting with a ``0`` in octal representation. This is done for portability reasons. A comment about **online** help: NEMO uses the Unix *man(5)* format for more detailed online help, although the **inline** help (system ``help=`` keyword) is most of the times sufficient enough to remind a novice user of the keywords and their meaning. The ``man`` command is a last resort, if more detailed information and examples are needed. .. code-block:: 3% man tsf Note that, since the online manual page is a different file from the source code, information in the manual page can easily get outdated, and the inline (``help=``) help, although very brief, is more likely to be up to date since it is generated from the source code (executable) itself: .. code-block:: 4% tsf help=h in : input file name [???] maxprec : print nums with max precision [false] maxline : max lines per item [4] allline : print all lines (overrides maxline) [false] indent : indentation of compound items [2] margin : righthand margin [72] item : Select specific item [] xml : output data in XML format? (experimental) [f] octal : Force integer output in octal again? [f] VERSION : 29-aug-02 PJT [3.1] Pipes ----- In the UNIX operating system pipes can be very effectively used to pass information from one process to another. One of the well known textbook examples is how one gets a list of misspelled (or unknown) words from a document: .. code-block:: % spell file | sort | uniq | more NEMO programs can also pass data via UNIX pipes, although with a slightly different syntax: a dataset that is going to be part of a pipe (either input or output) has to be designated with the ``-`` (*dash*) symbol for their filename. Also, and this is very important, the receiving task at the other end of the pipe should get data from only one source. If the task at the sending end of the pipe wants to send binary data over that pipe, but in addition the same task would also write *normal* standard output, the pipe would be corrupted with two incompatible sources of data. An example of this is the program ``snapcenter``. The keyword ``report`` must be set to ``false`` instead, which is actually the default now. So, for example, the output of a previous N-body integration is re-centered on it's center of mass, and subsequently rectified and stacked into a single image as follows: .. code-block:: % snapcenter r001.dat . report=t | tabplot - 0 1,2,3 % snapcenter r001.dat - report=f |\ snaprect - - 'weight=-phi*phi*phi' |\ snapgrid - r001.sum stack=t If the keyword ``report=f`` would not have been set properly, ``snaprect`` would not have been able to process it's convoluted input. Some other examples are discussed in Section~\ref{ss:data}. History of Data Reduction ------------------------- Most programs in NEMO will automatically keep track of the history of their data-files in a self-describing and self-documenting way. If a program modifies an input file and produces an output file, it will prepend the command-line with which it was invoked to its data history. The data history is normally located at the beginning of a data file. Comments entered using the frequently used program keyword ``headline=`` will also appear in the history section of your data file. A utility, ``hisf`` can be used to display the history of a data-file. This utility can also be used to create a pure history file (without any data) by using the optional ``out=`` and ``text=`` keywords. Of course ``tsf`` could also be used by scanning its output for the string ``History`` or ``Headline``: .. code-block:: 5% tsf r001.dat | grep History which shows that ``tsf``, together with it's counterpart ``rsf`` has virtually the same functionality as ``hisf``. Table format ------------ Many programs are capable of producing standard output in (ASCII) tabular format. The output can be gathered into a file using standard UNIX I/O redirection. In the example .. code-block:: 6% radprof r001.dat tab=true > r001.tab the file ``r001.tab`` will contain (amongst others) columns with surface density and radius from the snapshot ``r001.dat``. These (ASCII) *table* files can be used by various programs for further display and analysis. NEMO also has a few programs for this purpose available (*e.g.*} ``tabhist`` for analysis and histogram plotting, ``tablsqfit`` for checking correlations between two columns and ``tabmath`` for general table handling. The manual pages of the relevant NEMO programs should inform you how to get nice tabular output, but sometimes it is also necessary to write a shell/awk script or parser to do the job. A usefull (open source domain) program *redir(1NEMO)* has been included in NEMO .. code-block:: 7% redir -e debug.out tsf r001.dat debug=2 would run the ``tsf`` command, but redirecting the *stderr* standard error output to a file ``stderr.out``. There are ways in the C-shell to do the same thing, but they are clumsy and hard to remember. In the bash shell this is accomplished much easier: .. code-block:: 7$ tsf r001.dat debug=2 2>debug.out One last word of caution regarding tables: tables can also be used very effectively in pipes, for example take the first example, and pipe the output into ``tabplot`` to get a quick look at the profile: .. code-block:: 8% snapprint r001.dat r | tabhist - In older versions of NEMO the number of lines that tabhist could read from a pipe, but as of NEMO version 4.4 tables can be arbitrarely large in size. Dynamically Loadable Functions ------------------------------ A very peculiar data file format encountered in NEMO is that of the function descriptors. They present themselves to the user through one or more keywords, and in reality point to a compiled piece of code that will get loaded by NEMO (using *loadobj(3NEMO)*). They normally live in ``$NEMOOBJ``. We currently have 4 of these in NEMO: Potential Descriptors ~~~~~~~~~~~~~~~~~~~~~ The potential descriptor is used in orbit calculations and a few N-body programs. These are actually binary object files (hence extremely system dependent!!), and used by the dynamic object loader during runtime. Potentials are supplied to NEMO programs as an input variable (*i.e.* a set of keywords, normally called ``potname=``, ``potpars=`` and ``potfile=``. For this, a mechanism is needed to dynamically load the code which calculates the potential. This is done by a dynamic object loader that comes with NEMO. If a program needs a potential, and it is present in the default repository (``$POTPATH`` or {``$NEMOOBJ/potential``), it is directly loaded into memory by this dynamic object loader. If only a source file is present, *e.g.* in the current directory, it is compiled on the fly and then loaded. The source code can be written in C or FORTRAN. Rules and more information can be found in *potential(3NEMO)* and *potential(5NEMO)* The program *potlist(1NEMO)* can be used to test potential descriptors. Bodytrans Functions ~~~~~~~~~~~~~~~~~~~ Another family of object files used by the dynamic object loader are the *bodytrans(5NEMO)* functions. These were actually the first one of this kind introduced in NEMO. They are functions generated from expressions containing body-variables (mass, position, potential, time, ordinal number etc.). They frequently occur in programs where it is desirable to have an arbitrary expression of body variables *e.g.* plotting and printing programs, sorting program etc. Expressions which are not in the standard repository (currently ``$BTRPATH`` or ``$NEMOOBJ/bodytrans``) will be generated on the fly and saved for later use. The program *bodytrans(1NEMO)* is available to test and save new expressions. Examples are given in Section~\ref{s-dispanal}, a table of the precompiled ones are in Table~\ref{t:bodytrans}. Nonlinear Least Squares Fitting Functions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The program *tabnllsqfit(1NEMO)* can fit (linear or non-linear, depending on the parameters) a function to a set of datapoints from an ASCII table. The keyword ``fit=`` describes the model (*e.g.* a line, plane, gaussian, circle, etc.), of which a few common ones have been pre-compiled with the program. In that sense this is different from the previous two function descriptors, which always get loaded from a directory with precompiled object files. The keyword ``load=`` can be used to feed a user defined function to this program. The manual page has a lot more details. Rotation Curves Fitting Functions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Very similar to the Nonlinear Least Squares Fitting Functions are the Rotation Curves Fitting Functions, except they are peculiar to the 1- and 2-dimensional rotation curves one find in galaxies as the result of a projected circular streaming model. The program *rotcurshape(1NEMO)* is the only program that uses these functions, the manual page has a lot more details. .. include:: fits.rst