Table of Contents


filestruct - primitives for structured binary file I/O


#include <stdinc.h>
#include <filestruct.h>
bool get_tag_ok(str, tag)
void get_data(str, tag, typ, dat, dimN, ..., dim1, 0)
void get_data_coerced(str, tag, typ, dat, dimN, ..., dim1, 0)
string get_string(str, tag)
void get_set(str, tag)
void get_tes(str, tag)
void put_data(str, tag, typ, dat, dimN, ..., dim1, 0)
void put_string(str, tag, msg)
void put_set(str, tag)
void put_tes(str, tag)

void get_data_set(str, tag, typ, dat, dimN, ..., dim1, 0)
void get_data_ran(str, tag, dat, offset, length)
void get_data_blocked(str, tag, dat, length)
void get_data_tes(str, tag)
void put_data_set(str, tag, typ, dat, dimN, ..., dim1, 0)
void put_data_ran(str, tag, dat, offset, length)
void put_data_blocked(str, tag, dat, length)
void put_data_tes(str, tag)

string get_type(str, tag)
int *get_dims(str, tag)
int get_dlen(str, tag)

void strclose(str)
bool qsf(str)

stream str;
string tag;
int typ;
byte *dat;
int dimN, ..., dim1;
string msg;
int offset, length


These routines provide a simple yet reasonably general mechanism for the structured input and output of binary data. A structured binary file may be viewed as a sequence of tagged data items (very much like the more modern XLM files); special symbols are introduced to group items hierarchically into sets. Data items are typically scalar values or homogeneous arrays constructed from the elementary C data types: char, short, int, long, float, double. filestruct.h defines the following symbolic names for these types:
standard C characters, assumed printable.
like C characters, but unsigned and unprintable.
anything at all; see below.
standard C short integers.
standard C integers.
standard C long integers.
16 bit floating point half precision (non-standard, cf. EXR)
standard C floating point.
standard C double-precision.

The first three types are all synonyms for CharType, but the meanings conveyed are quite different. CharType is reserved for strings of legible characters, while ByteType identifies data in 8-bit binary chunks. AnyType, while operationally identical, implies that data (typically in an array) may not naturally divide on 8-bit boundaries; this type currently provides an escape hatch for structure I/O (see the example below).

get_tag_ok(str, tag) is used to determine if a subsequent get_data(), get_string(), or get_set() call will succeed in finding an item named tag in the structured binary input stream str. The algorithm used to determine this depends on whether the structured input point is at top level or within a set. At top level, the next item must match the specified tag. Within a set, the input point is effectively rewound to the first item of the set, and the entire set is scanned for the tag. get_tag_ok() returns FALSE on end of file.

get_data(str, tag, typ, dat, dimN, ..., dim1, 0) transfers data from a structured binary input stream str to a scalar or homogeneous array at address dat. First an item named tag is found with the algorithm described above. The type of the item is checked against typ, and the dimensions (if any) are checked against arguments dimN, ..., dim1. If they match, the item data is copied to the specified address.

get_data_coerced(str, tag, typ, dat, dimN, ..., dim1, 0) performs the same function as get_data() except that the types of the item may be FloatType and the type specified by the parameter typ may be DoubleType or the other way around. If typ matches the type of the item, this function is identical to get_data(); if a conversion other than Float->Double or Double->Float is attempted, an error is signaled.

get_string(str, tag) searches as above for an item named tag, which must contain a null-terminated array of characters. The data is copied to space allocated using malloc(3) and a pointer is returned.

get_set(str, tag) searches as above for a set named tag (in fact, the tag is carried by a special symbol used to mark the start of the set). The contents of this set are then taken as the scope of subsequent get_tag_ok(), get_data(), get_string() and get_set() calls.

get_tes(str, tag) terminates the scan of the current input set, and returns to the scope which was in effect before the set was accessed. If tag is not NULL, it must match the tag of the corresponding get_set() call. When input is from the top level, the input pointer is left before the next item in the input stream. (Note: tes is set backwards).

put_data(str, tag, typ, dat, dimN, ..., dim1, 0) is effectively the inverse of get_data() above: the data pointed to by dat, which is of type typ and dimensions dimN, ..., dim1, is emitted to the structured output stream str as an item named tag.

put_string(str, tag, msg) is the inverse of get_string() above.

put_set(str, tag) begins the output of a set named tag. The contents of the set are supplied by subsequent calls to put_data(), put_string(), and put_set().

put_tes(str, tag) terminates the output of a set.

strclose(str) is the preferred way to close binary streams used in the above operations; it need not be called unless the stream must be explicitly closed (for example, for later reuse). In case the stream was opened as a special one (e.g. a scratch stream, see stropen(3NEMO) ), strclose is the only means to properly clean up.

qsf(str) queries if an input stream is a binary structured one. Since this requires data to be read (which may be needed later on), this function cannot be used with pipes. qsf always returns FALSE in this case. Also, it is left to the application programmer to properly place the filepointer (rewind(3) ) in case that stream has to be used for input.

get_data_set and get_data_tes bracket random data access, which is achieved by get_data_ran. offset and length are both in units of the item-length. They have a pipe-safe interface called get_data_blocked, where the I/O must occur sequentially.

get_type, get_dims, and get_dlen return the type, dimension array (allocated and zero terminated!), and data-length in bytes for the whole item. These routines should be rarely needed by programmers though.


The following code fragment reads and later writes some data to structured binary files.
        stream instr, outstr;
    int nobj;
    double time, mass[4096], phase[4096][2][3];
    get_set(instr, "SnapShot");
    get_set(instr, "Parameters");
    get_data(instr, "Time", DoubleType, &time, 0);
    get_data(instr, "Nobj", IntType, &nobj, 0);
    get_set(instr, "Particles");
    if (get_tag_ok(instr, "Mass"))
        get_data(instr, "Mass", DoubleType, mass, nobj, 0);
    get_data(instr, "PhaseSpace", DoubleType, phase,
             nobj, 2, 3, 0);
    get_tes(instr, "Particles");
    get_tes(instr, "SnapShot");
    put_set(outstr, "SnapShot");
    put_set(outstr, "Parameters");
    put_data(outstr, "Time", DoubleType, &time, 0);
    put_data(outstr, "Nobj", IntType, &nobj, 0);
    put_tes(outstr, "Parameters");
    put_set(outstr, "Particles");
    put_data(outstr, "Mass", DoubleType, mass, nobj, 0);
    put_data(outstr, "PhaseSpace", DoubleType, phase,
             nobj, 2, 3, 0);
    put_tes(outstr, "Particles");
    put_tes(outstr, "SnapShot");

    Notes: the first two calls to get_data() and put_data() illustrate the I/O of scalar data: although no dimensions are listed, the terminating 0 (zero) must appear in the arg list. Later calls show how arrays are specified. The Mass item will only be input if it appears in the Particles set.

Structures which do not contain pointer data can be handled using the AnyType, but with somewhat limited functionality: a structure of type foo is treated as an array of sizeof(foo) objects of type AnyType. This means, alas, that the contents of structures are hidden to utilities like tsf(1) . The following example shows how to handle structures:

        struct foo {
        int erupt;
        char actor;
        double trouble;
    } footab[64];
    get_data(instr, "FooTab", AnyType, footab,
             64, sizeof(struct foo), 0);
    put_data(outstr, "FooTab", AnyType, footab,
             64, sizeof(struct foo), 0);


Exceptional conditions (eg, unexpected EOF), invalid arguments (eg, types out of range) and low-level catastrophies (eg, running out of memory) generate messages via error(3) , which will, in general, return the program to the operating system. This error-checking is implemented with the goal of freeing applications programmers from the responsibility of checking for I/O errors other than end-of-file.


The library will delay reading large data-items in memory and only store a pointer to their location until it is really needed via one of the get_data() routines.


Whenever pipes are used, all data is read into memory, as opposed to being deferred for input. This may lead to large memory consuption.

random access can currently only take place in one item


Joshua E. Barnes, Lyman P. Hurd, Peter Teuben

See Also

filestruct(5NEMO) , NEMO Users/Programmers Guide (half precision floating point)

Update History

4-Apr-87    original implementation        JEB
30-Aug-87    type coersion, deferred input    LPH
16-Apr-88    new types, operators, etc    JEB
16-May-92    random access to data       PJT
5-mar-94    documented qsf              PJT
2-jun-05    added blocked I/O        PJT

Table of Contents