get_atable, get_ftable, get_line, parse, strinsert - legacy table manipulator routines
#include <table.h>#include <mdarray.h>table *table_open(string fname, int rowbufsize, ? int maxrow);mdarray2 table_md2(table *t);void table_read(table *t);string *table_comments(table *t);void table_reset(table *t);void table_close(table *t); int table_nrows(table *t);int table_ncols(table *t);void table_set_valid_rows(int nrows, int *rows)void table_set_valid_cols(int ncols, int *cols) int table_next_row(table *t) int table_next_rows(table *t) int table_next_rowi(table *t)int table_next_rowr(table *t)string table_line(table *t) string table_cols(table *t, int col)int table_coli(table *t, int col)real table_colr(table *t, int col) string *table_colsp(table *t, int col) int *table_colip(table *t, int col)real *table_colrp(table *t, int col) string table_row(table *t, int row)string *table_rowsp(table *t, int row)int *table_rowip(table *t, int row)real *table_rowrp(table *t, int row) void table_set_ncols(int ncols) Old: int get_atable(strean instr,int ncol,int *colnr,real *coldat,int ndat) int get_ftable(stream instr,int ncol,int *colpos,string *colfmt,real *coldat,int ndat) // check int get_line (string instr, char *line) int parse(int linenr, char *line, double *dat, int ndat) int strinsert(char *a, char *b, int n) int iscomment(char *line)
table_open opens a file for reading. The returned table * pointer is used in all subsequent table_ routines. The rowbufsize controls how many lines from the table are allowed in internal buffers. A value of 0 means the whole table will be read in memory, a value of 1 will read the table line by line. Values larger than 1 are planned, but not yet supported. maxrow used to be in the old system a lot, but we can probably live without it. It normally is only needed when the input file is a pipe and the whole file needs to be read, which is now supported. table_md2 is a convenient way to convert an ascii table immediately into a two dimensional mdarray(3NEMO) . With table_read the whole table is read into memory. Any comment lines at the start of the file will saved in a special comment set of lines, which can be extracted with table_comments. Finally table_close access to the table can be closed and any associated memory will be freed. In addition table_reset can be used to reset array access (more on that later), in the case it needs to be re-read. For arrays that are processed in streaming mode (e.g. filename="-") this will result in an error.
Once a table has been fully read into memory, table_nrows returns the number of rows, and table_ncols the number of columns. By using table_set_valid_rows and/or table_set_valid_cols rows and/or columns can be selected for conversion, and this will also define the new value for nrows and ncols. When table_reset is called, these values are reset to their original value.
If the table is parsed line by line, some routines will not be accessible, since the table is not in memory.
Using table_next_row a new line can be read. This will return -1 upon end of file, 0 when the line is blank or contains no data, though could contain comments (e.g. lines with # ! or ;), and 1 when a line was read. No parsing will be done. If parsing is done, the line will be tokenized in identical types (string, int or real), with resp. table_next_rows , table_next_rowi, or table_next_rowr. The last line is always stored internally, and a pointer to the string can be retrieved with table_line for more refined user parsing.
Depending on with which of the three types the line was parsed, column elements can be retrieved with table_cols, table_coli, or table_colr. and if the whole table was available in memory, columns can also be retrieved in full via table_colsp, table_colip, or table_colrp
The currently parsed row can in full be retrieved with (again, depending on type) table_rowsp, table_rowip, or table_rowrp where the row number is ignored if the table is parsed row by row.
Possible future routines are table_set_ncols to cover the case where a row can span multiple lines. By default each line is a row in the table.
The original legacy table routines remain available:
Both get_atable and get_ftable parse an ascii table, pointed by the instr stream, into ncol columns and up to ndat rows of real numbers into memory. The input table may contain comment lines, as well as columns which are not numbers. Badly parsed lines are simply skipped. Other common parameters to both routines are coldat, ncol and ndat: coldat is an array of ncol pointers to previously allocated data, each of them ndat real elements. The number of valid rows read is then returned. If this number is negative, it means more data is available, but could not be read because ndat was exhausted. Upon the next call ndat must be set to negative, to recover the last line read on the previous call, and continue reading the table without missing a line. CAVEAT: this only works if instr has not changed.
get_atable parses the table in free format. colnr an array of length ncol of the column numbers to read (1 being the first column), If any of the colnr is 0, it is interpreted as referring to the line number in the original input file (including/excluding comment and empty lines), 1 being the first line, and the corresponding entry in coldat is set as such. Columns are separated by whitespace or commas.
get_ftable parses the table in fixed format. colpos is an array with positions in the rows to start reading (1 being the first position), colfmt an array of pointers to the format string used to parse a real number (note real normally requires %lf). If any of the colpos is 0, it is interpreted as referring to the line number in the original input file (including comment lines), 1 being the first line, and the corresponding entry in coldat is set as such.
The get_line(3) gets the next line from a stream instr, stored into line. It returns the length of the string read, 0 if end of file. This routine is deprecated, the standard getline(3) should be used.
parse parses the character string in line into the double array dat, which has at most ndat entries. Parsing means that %n refers to column n in the character string (n must be larger than 0. Also %0 may be referenced, meaning the current line number, to be entered in the argument linenr.
strinsert inserts the string b into a, replacing n characters of a.
iscomment returns 1 if the line appears to be a comment (starts with ’;’, ’#’, ’!’ or a blank/newline)
An example reading in a full table into a two dimensional mdarray2, and
adding 1 to each element:
table *t = table_open(filename, 0, 0); ncols = table_ncols(t); nrows = table_nrows(t); mdarray2 d2 = table_md2(t); table_close(t); for (int i=0; i<nrows; i++) for (int j=0; j<ncols; j++) d2[i][j] += 1.0;and here is an example of reading the table line by line, without any parsing,
but removing comment lines
table *t = table_open(filename, 1, 0); int nrows = 0; while ( (n=table_next_row(t)) >= 0) { if (n > 0) { nrows++ printf("%s\n", table_line(t)); } } table_close(t); dprintf(0,"Read %d lines\n",nrows);and dealing (and preserving) comments while reading in the whole table:
table *t = table_open(filename, 0, 0); //? table_read(t); int nrows = table_nrows(t); string *sp = table_comments(t); while (*sp) printf("%s0,*sp++); for (int j=0; j<nrows; j++) real *rp = table_rowrp(t, j); table_close(t);
cat AAPL.csv | xsv table | head -2 cat AAPL.csv | xsv slice -i 1 | xsv table cat AAPL.csv | xsv slice -i 1 | xsv flatten cat AAPL.csv | xsv count
https://github.com/BurntSushi/xsv https://heasarc.gsfc.nasa.gov/docs/software/fitsio/c/c_user/cfitsio.html
src/kernel/tab table.c gettab.c
xx-sep-88 V1.0 written PJT 6-aug-92 documented get_Xtable functions PJT 1-sep-95 added iscomment() PJT 12-jul-03 fixed reading large table buffereing PJT aug-2020 designing new table system Sathvik/PJT