Table of Contents
tabnllsqfit - general purpose non-linear least squares fitting program
tabnllsqfit in=table-file [parameter=value]
tabnllsqfit
takes a non-linear least squares fit (minimized ChiSquared) of a set of
datapoints (x(i),y(i)) where a functional form y=f(x;a0,a1,...,an) is assumed.
The function is not required to be non-linear in it parameters. Although
a number of common pre-defined functional forms can be selected (see fit=),
you can also write a small C routine, and dynamically load the fitted function
during runtime (see load=).
The inputfile is an ascii file in a tabular
format, where on every line there must be an assigned column for the X-
and Y-coordinate(s). An optional column can be assigned to the errors in
Y (the DY-coordinates, the inverse square of these being used as a weight).
If no DY-column is assigned, it is assumed that the errors in Y are all
the same and equal to 1. Lines starting with the '#' symbol are comment lines.
Certain models (see fit= below) allow multiple X or Y columns.
Each fit
can also produce an output file with an additional column computing the
difference between the data and the model (in that order). In the special
case where the user has supplied no free parameters, the residuals are
still computed correctly.
The following parameters are recognized
in any order if the keyword is also given:
- in=in-file
- (ascii) input file,
a table of values from which data is taken. No default.
- xcol=col,...
- The column(s)
from which the (independant) X-values are taken. The first column is numbered
1 [default: 1].
- ycol=col,...
- The column(s) from which the (dependant) Y-values
are taken [default: 2].
- dycol=int
- Optional column from which the weights
will be derived. By default this column is meant to refer to an error bar,
and normally the inverse square of the errors are uses as weights for
those fits where these are used (see dypow= next). [default: none assigned].
- dypow=power
- By changing this value, you can adjust the weights column (dycol).
Instead of the normal weight = 1/sigma^2, you will be able to use them as
weight^dypow. The most common use is when your dycol does not refer to an
error, but to the weight directly (e.g. an intensity), in which case dypow=-0.5
is appropriate. [default: 1].
- xrange=xmin:xmax
- Range in X-values for which
the fit is done. If only one number is given, the other one is considered
zero. This keyword cannot be used if more than one X-column is used. [default:
no entry, i.e. all].
- fit=model
- Model name what to fit. Currently implemented
are line, plane, poly, gauss, loren, and exp. [default: line].
- order=order
- Highest order of the polynomial fit (fit=poly) or number of dimension
of the hyper-plane fit (fit=plane). 0 would fit a constant. [Default: 0].
- out=filename
- Optional output filename where the data and residuals are
stored. The first few columns contain the X and Y columns, the last column
contains the residual Y-F(X). See an example below how to plot the data with
a model fit overlayed. [default: no output file created].
- nsigma=sigma_factor
- A positive number will delete points more than sigma_factor from the fit,
and fit again. Multiple values can be given here, in which case multiple
iterations are done. [Default: -1].
- par=
- Initial estimates for the parameters.
For non-linear fits, or linear fits where certain parameters are fixed (see
free= below) they need to be given here. Some fits (e.g. gauss1d) do not require
an initial estimate and attempt to come up with a reasonable one. Otherwise
all initial parameters are 0.
- free=
- A list of 1's and 0's to set which parameters
are to be kept free (1)
and which are fixed (0). By default all parameters
are free, i.e. fitted. If any parameter is fixed, initial estimates for all
need to be given (see par=). Default: all parameters free.
- load=
- If given,
this should contain the full path to a compiled and shared object (.so)
file) that contains the function to be fitted in terms of the nllsqfit
parameters f and df. They must be named func_loadobj and derv_loadobj (maybe
modified with fit=). See LOAD FUNCTIONS below.
- x=
- If given, these are the
X-values for which the load function can be tested. It is useful for testing
and debugging your function. It will report the X, Y and the dY/dPAR vector
values. The program will exit after this, and no fitting is done. Default:
none.
- nmax=max_lines
- Maximum number of lines allocated if the input file
was being read from a pipe. If not, the routine file_lines(3NEMO)
is used
to allocate space for the table. [Default: 10000].
- tol=
- Tolerance for convergence
of nllsqfit [Default: 0.0].
- lab=
- Mixing parameter for nllsqfit [Default:
0.0 for linear, 0.1 for non-linear fits].
- itmax=
- Maximum number of allowed
nllsqfit iterations [Default: 50]
- format=
- Output format. Default %g
- bootstrap=
- Number of bootstrap samples to take to estimate the error in the parameters.
Output off all parameters and their errors are on one line containing the
word bootstrap. Parameters and their errors are output in pairs. Default:0
- seed=
- Initial seed. See xrandom(3)
for more details. Default: 0
- numrec=t|f
- Use the classic (gipsy based) NLLSQFIT [t], or the Numerical Recipes based
one (f)? See also nllsqfit(3NEMO)
.
Parameters are referred
to as p0,p1,p2,p3,.....
line y=p0+p1*x (same as poly,order=1)
plane z=p0+p1*x+p2*y (example order=2)
polynomial y=p0+p1*x+p2*x^2+p3*x^3... (example order=3)
gauss1d y=p0+p1*exp(-(x-p2)^2/(2*p3^2))
gauss2d y=p0+p1*exp(-[(x-p2)^2+(x-p3)^2]/(2*p4^2))
exp y=p0+p1*exp(-(x-p2)/p3)
grow y=p0*(exp(x/p1)-1)
arm y=p0+p1*cos(x)+p2*sin(x)
arm3 y=p0+p1*cos(x)+p2*sin(x)+p3*cos(3*x)+p4*sin(3*x)
loren y=p1/PI( (x-p0)^2 + p1^2 )
psf y=p1*x^p2*sin(y)^p3+p4
Here is an example of a linear fit: a straight line, with some
added noise and random weights between 1 and 2: (ieck, can't reproduce this
anymore)
% nemoinp 1:100 |\
tabmath - - "%1+rang(0,10),ranu(1,2)" seed=123 |\
tabnllsqfit - 1 2 3
nrt=0
Fitting a+bx:
a= 1.87458 2.29818
b= 0.961672 0.0398614
Here is an example of a 2D plane in 3D: (1+2x+3y)
% ccdmath "" - '1+2*%x+3*%y+rang(0,0.1)' 5,5 seed=123 |\
ccdprint - x= y= label=x,y newline=t |\
tabnllsqfit - 1,2 3 fit=plane order=2
nrt=0
Fitting p0+p1*x1+p2*x2+.....pN*xN: (N=2)
p0= 1.0688 0.0523819
p1= 2.01497 0.0165646
p2= 2.97436 0.0165646
And a fit to a gaussian:
% nemoinp 1:100 |\
tabmath - - '4+exp(-(%1-50)**2/(200))+ranu(0,1)' seed=123 |\
tabnllsqfit - fit=gauss par=4,1,50,10
nrt=13
Fitting a+b*exp(-(x-c)^2/(2*d^2)):
a= 4.46714 0.0416026
b= 1.13036 0.0994723
c= 50.2263 0.845469
d= 8.70728 0.959347
rms2/chi2= 8.92068
rms/chi = 1
Here is a contrived example of plotting the function to be plotted, by
fixing all parameters and computing a residual table from 0s:
% nemoinp 0:10:0.1 | tabmath - tab0 0
% tabnllsqfit tab0 1 2 fit=gauss par=1,2,5,1 free=0,0,0,0 out=tab0.d
% tabmath tab0.d - %1,-%3 | tabplot -
Here is an example of removing outlier points and fitting again:
% nemoinp 1:10 |\
tabmath - - '2*%1+1+rang(0,0.1)' seed=123 |\
tabnllsqfit - fit=line nsigma=1.5::3
nrt=0
Fitting a+bx:
a= 1.09548 0.0775617
b= 1.99937 0.0125002
2/10 points outside 1.5*sigma (0.152328)
nrt=0
Fitting a+bx:
a= 1.02651 0.0452119
b= 2.01358 0.00753531
0/8 points outside 1.5*sigma (0.080422)
Although 3 iterations were requested, after the first iteration no more
points were removed, and the iterations were stopped.
Here is an example
of estimating the errors via a bootstrap (resampling of errors) method.
Fitting a polynomial of order 2 and taking 100 bootstrap samples:
% tabnllsqfit tab11 fit=poly order=2 bootstrap=100
nrt=0
Fitting p0+p1*x+p2*x^2+.....pN*x^N: (N=2)
p0= 3.11325 0.192787
p1= 1.97474 0.0896959
p2= 0.00157311 0.008639
bootstrap= 3.11603 0.164922 1.96464 0.086429 0.00293632 0.00820007
^^^^ ^^^^^ ^^^^^ ^^^^^ ^^^^^^ ^^^^^^
P0 dP0 P1 dP2 P3 dP3
With the load= keyword dynamic object files can be loaded
using the loadobj(3NEMO)
mechanism. The convention is that two functions
must be externally visible, and named func_method and derv_method (where
method is the same as the fit= keyword.
Here is an example of the file myline.c
that can be used with fit=line load=myline.so and compiled with
bake myline.so
/* File: myline.c */
#include <stdinc.h>
real func_line(real *x, real *p, int np)
{
return p[0] + p[1]*x[0];
}
void derv_line(real *x, real *p, real *e, int np)
{
e[0] = 1.0;
e[1] = x[0];
}
One word of caution: if you find the program having a hard time finding
a solution in complex cases, it is quite possible that this is not due
to the fact that the function is complex, but due to noise or bad initial
conditions.
It will not recognize linear fits if the non-linear parameters
are kept fixed, e.g. the offset p0 in fit=gauss.
tablsqfit(1NEMO)
,hist(1NEMO)
,
tabmath(1NEMO)
, gaussfit(1NEMO)
, linreg(1NEMO)
, nllsqfit(3NEMO)
, fit.dc1(GIPSY)
Numerical Recipies in C, Ch.14
NLREG: http://www.nlreg.com
NIST non-linear:
http://www.itl.nist.gov/div898/strd/lls/lls.shtml
NIST linear: http://www.itl.nist.gov/div898/strd/nls/nls_main.shtml
fityk: http://fityk.nieto.pl/
A new scheme for calculating weights and describing
correlations in nonlinear least squares fits. (Hessler, Curent & Ogren, C.I.P.
10, 186, 1996): http://dx.doi.org/10.1063/1.168569
Peter Teuben
~/src/kernel/tab tabnllsqfit.c
~/src/kernel/tab/fit example fitting functions
12-jul-02 V1.0 cloned off tablsqfit PJT
17-jul-02 V1.1 added load=, x=, numrec= PJT
11-sep-02 V1.1e changes error/warning to accomodate residual writen PJT
21-nov-02 V1.4 nsigma= can be an array of iterations PJT
14-feb-03 V1.6 arm,arm3 for Rahul PJT
21-mar-03 V1.7 added bootstrap=, seed= PJT
4-apr-03 V1.8 fixed error in using dycol=, and introduced dypow= PJT
15-mar-04 V1.8b added fit=loren and corrected lab= setting for functions PJT/RS
21-nov-05 V2.0 added fit=gauss1d,gauss2d PJT
24-apr-08 V2.1 added psf PJT
7-may-10 V2.2 added grow PJT
24-dec-11 V2.3b estimate gauss1d if no initial par given PJT
Table of Contents