Table of Contents
tabnllsqfit - general purpose non-linear least squares fitting program
tabnllsqfit in=table-file [parameter=value]
tabnllsqfit
takes a non-linear least squares fit (minimized ChiSquared) of a set of
datapoints (x(i),y(i)) where a functional form y=f(x;a0,a1,...,an) is assumed.
The function is not required to be non-linear in it parameters. Although
a number of common pre-defined functional forms can be selected (see fit=),
you can also write a small C routine, and dynamically load the fitted function
during runtime (see load=).
The inputfile is an ascii file in a tabular
format, where on every line there must be an assigned column for the X-
and Y-coordinate(s). An optional column can be assigned to the errors in
Y (the DY-coordinates, the inverse square of these being used as a weight).
If no DY-column is assigned, it is assumed that the errors in Y are all
the same and equal to 1. Lines starting with the '#' symbol are comment lines.
Certain models (see fit= below) allow multiple X or Y columns.
Each fit
can also produce an output file with an additional column computing the
difference between the data and the model (in that order). In the special
case where the user has supplied no free parameters, the residuals are
still computed correctly.
The following parameters are recognized
in any order if the keyword is also given:
- in=in-file
- (ascii) input file,
a table of values from which data is taken. No default.
- xcol=col,...
- The column(s)
from which the (independant) X-values are taken. The first column is numbered
1 [default: 1].
- ycol=col,...
- The column(s) from which the (dependant) Y-values
are taken [default: 2].
- dycol=int
- Optional column from which the weights
will be derived. By default this column is meant to refer to an error bar,
and normally the inverse square of the errors are uses as weights for
those fits where these are used (see dypow= next). [default: none assigned].
- dypow=power
- By changing this value, you can adjust the weights column (dycol).
Instead of the normal weight = 1/sigma^2, you will be able to use them as
weight^dypow. The most common use is when your dycol does not refer to an
error, but to the weight directly (e.g. an intensity), in which case dypow=-0.5
is appropriate. [default: 1].
- xrange=xmin:xmax
- Range in X-values for which
the fit is done. If only one number is given, the other one is considered
zero. This keyword cannot be used if more than one X-column is used. [default:
no entry, i.e. all].
- fit=model
- Model name what to fit. Currently implemented
are line, plane, poly, gauss, loren, and exp. [default: line].
- order=order
- Highest order of the polynomial fit (fit=poly) or number of dimension
of the hyper-plane fit (fit=plane). 0 would fit a constant. [Default: 0].
- out=filename
- Optional output filename where the data and residuals are
stored. The first few columns contain the X and Y columns, the last column
contains the residual Y-F(X). See an example below how to plot the data with
a model fit overlayed. [default: no output file created].
- nsigma=sigma_factor
- A positive number will delete points more than sigma_factor from the fit,
and fit again. Multiple values can be given here, in which case multiple
iterations are done. [Default: -1].
- par=
- Initial estimates for the parameters.
For non-linear fits, or linear fits where certain parameters are fixed (see
free= below) they need to be given here. By default they are all 0.
- free=
- A list of 1's and 0's to set which parameters are to be kept free (1)
and
which are fixed (0). By default all parameters are free, i.e. fitted. If any
parameter is fixed, initial estimates for all need to be given (see par=).
Default: all parameters free.
- load=
- If given, this should contain the full
path to a compiled and shared object (.so) file) that contains the function
to be fitted in terms of the nllsqfit parameters f and df. They must be
named func_loadobj and derv_loadobj (maybe modified with fit=). See LOAD
FUNCTIONS below.
- x=
- If given, these are the X-values for which the load function
can be tested. It is useful for testing and debugging your function. It will
report the X, Y and the dY/dPAR vector values. The program will exit after
this, and no fitting is done. Default: none.
- nmax=max_lines
- Maximum number
of lines allocated if the input file was being read from a pipe. If not,
the routine file_lines(3NEMO)
is used to allocate space for the table. [Default:
10000].
- tol=
- Tolerance for convergence of nllsqfit [Default: 0.0].
- lab=
- Mixing
parameter for nllsqfit [Default: 0.0 for linear, 0.1 for non-linear fits].
- itmax=
- Maximum number of allowed nllsqfit iterations [Default: 50]
- format=
- Output format. Default %g
- bootstrap=
- Number of bootstrap samples to take
to estimate the error in the parameters. Output off all parameters and their
errors are on one line containing the word bootstrap. Parameters and their
errors are output in pairs. Default:0
- seed=
- Initial seed. See xrandom(3)
for more details. Default: 0
Parameters are referred to as
p0,p1,p2,p3,.....
line y=p0+p1*x (same as poly,order=1)
plane z=p0+p1*x+p2*y (example order=2)
polynomial y=p0+p1*x+p2*x^2+p3*x^3... (example order=3)
gauss1d y=p0+p1*exp(-(x-p2)^2/(2*p3^2))
gauss2d y=p0+p1*exp(-[(x-p2)^2+(x-p3)^2]/(2*p4^2))
exp y=p0+p1*exp(-(x-p2)/p3)
arm y=p0+p1*cos(x)+p2*sin(x)
arm3 y=p0+p1*cos(x)+p2*sin(x)+p3*cos(3*x)+p4*sin(3*x)
loren y=p1/PI( (x-p0)^2 + p1^2 )
Here is an example of a linear fit: a straight line, with some
added noise and random weights between 1 and 2: (ieck, can't reproduce this
anymore)
% nemoinp 1:100 |\
tabmath - - "%1+rang(0,10),ranu(1,2)" seed=123 |\
tabnllsqfit - 1 2 3
nrt=0
Fitting a+bx:
a= 1.87458 2.29818
b= 0.961672 0.0398614
Here is an example of a 2D plane in 3D: (1+2x+3y)
% ccdmath "" - '1+2*%x+3*%y+rang(0,0.1)' 5,5 seed=123 |\
ccdprint - x= y= label=x,y newline=t |\
tabnllsqfit - 1,2 3 fit=plane order=2
nrt=0
Fitting p0+p1*x1+p2*x2+.....pN*xN: (N=2)
p0= 1.0688 0.0523819
p1= 2.01497 0.0165646
p2= 2.97436 0.0165646
And a fit to a gaussian:
% nemoinp 1:100 |\
tabmath - - '4+exp(-(%1-50)**2/(200))+ranu(0,1)' seed=123 |\
tabnllsqfit - fit=gauss par=4,1,50,10
nrt=13
Fitting a+b*exp(-(x-c)^2/(2*d^2)):
a= 4.46714 0.0416026
b= 1.13036 0.0994723
c= 50.2263 0.845469
d= 8.70728 0.959347
rms2/chi2= 8.92068
rms/chi = 1
Here is a contrived example of plotting the function to be plotted, by
fixing all parameters and computing a residual table from 0s:
% nemoinp 0:10:0.1 | tabmath - tab0 0
% tabnllsqfit tab0 1 2 fit=gauss par=1,2,5,1 free=0,0,0,0 out=tab0.d
% tabmath tab0.d - %1,-%3 | tabplot -
Here is an example of removing outlier points and fitting again:
% nemoinp 1:10 |\
tabmath - - '2*%1+1+rang(0,0.1)' seed=123 |\
tabnllsqfit - fit=line nsigma=1.5::3
nrt=0
Fitting a+bx:
a= 1.09548 0.0775617
b= 1.99937 0.0125002
2/10 points outside 1.5*sigma (0.152328)
nrt=0
Fitting a+bx:
a= 1.02651 0.0452119
b= 2.01358 0.00753531
0/8 points outside 1.5*sigma (0.080422)
Although 3 iterations were requested, after the first iteration no more
points were removed, and the iterations were stopped.
Here is an example
of estimating the errors via a bootstrap (resampling of errors) method.
Fitting a polynomial of order 2 and taking 100 bootstrap samples:
% tabnllsqfit tab11 fit=poly order=2 bootstrap=100
nrt=0
Fitting p0+p1*x+p2*x^2+.....pN*x^N: (N=2)
p0= 3.11325 0.192787
p1= 1.97474 0.0896959
p2= 0.00157311 0.008639
bootstrap= 3.11603 0.164922 1.96464 0.086429 0.00293632 0.00820007
^^^^ ^^^^^ ^^^^^ ^^^^^ ^^^^^^ ^^^^^^
P0 dP0 P1 dP2 P3 dP3
With the load= keyword dynamic object files can be loaded
using the loadobj(3NEMO)
mechanism. The convention is that two functions
must be externally visible, and named func_method and derv_method (where
method is the same as the fit= keyword.
Here is an example of the file myline.c
that can be used with fit=line load=myline.so and compiled with
bake myline.so
/* File: myline.c */
#include <stdinc.h>
real func_line(real *x, real *p, int np)
{
return p[0] + p[1]*x[0];
}
void derv_line(real *x, real *p, real *e, int np)
{
e[0] = 1.0;
e[1] = x[0];
}
One word of caution: if you find the program having a hard time finding
a solution in complex cases, it is quite possible that this is not due
to the fact that the function is complex, but due to
It will not
recognize linear fits if the non-linear parameters are kept fixed, e.g. the
offset p0 in fit=gauss.
tablsqfit(1NEMO)
,hist(1NEMO)
, tabmath(1NEMO)
,
gaussfit(1NEMO)
, linreg(1NEMO)
, nllsqfit(3NEMO)
, fit.dc1(GIPSY)
Numerical
Recipies in C, Ch.14
NLREG: http://www.nlreg.com
NIST non-linear: http://www.itl.nist.gov/div898/strd/lls/lls.shtml
NIST linear: http://www.itl.nist.gov/div898/strd/nls/nls_main.shtml
fityk:
http://www.unipress.waw.pl/~wojdyr/fityk/
Peter Teuben
~/src/kernel/tab tabnllsqfit.c
~/src/kernel/tab/fit example fitting functions
12-jul-02 V1.0 cloned off tablsqfit PJT
17-jul-02 V1.1 added load=, x=, numrec= PJT
11-sep-02 V1.1e changes error/warning to accomodate residual writen PJT
21-nov-02 V1.4 nsigma= can be an array of iterations PJT
14-feb-03 V1.6 arm,arm3 for Rahul PJT
21-mar-03 V1.7 added bootstrap=, seed= PJT
4-apr-03 V1.8 fixed error in using dycol=, and introduced dypow= PJT
15-mar-04 V1.8b added fit=loren and corrected lab= setting for functions PJT/RS
21-nov-05 V2.0 added fit=gauss1d,gauss2d PJT
Table of Contents