There are some related questions that I’ve come across (like this, this, this, and this) but they all deal with fitting data to a known curve. Is there a way to fit given data to an unknown curve? By which I mean, given some data the algorithm will give me a fit which is one function or a sum of functions. I’m programming in C, but I’m at a complete loss on how to use the gsl package to do this. I’m open to using anything that can (ideally) be piped through C. But any help on what direction I should look will be greatly appreciated.
EDIT: This is basically experimental (physics) data that I’ve collected, so the data will have some trend modified by additive gaussian distributed noise. In general the trend will be non-linear, so I guess that a linear regression fitting method will be unsuitable. As for the ordering, the data is time-ordered, so the curve necessarily has to be fit in that order.
You might be looking for polynomial interpolation, in the field of numerical analysis.
In polynomial interpolation – given a set of points (x,y) – you are trying to find the best polynom that fits these points. One way to do it is using Newton interpolation, which is fairly easy to program.
The field of numerical analysis and interpolations in specifics is widely studied, and you might be able to get some nice upper bound to the error of the polynom.
Note however, because you are looking for a polynom that best fits your data, and the function is not really a polynom – the scale of the error when getting far from your initial training set blasts off.
Also note, your data set is finite, and there are inifnite number (actually, non-enumerable infinity) of functions that can fit the data (exactly or approximately) – so which one out of these is the best might be specific to what you actually are trying to achieve.
If you are looking for a model to fit your data, note that linear regression and polynomial interpolations are at the opposite ends of the scale: polynomial interpolation might be an overfitting to a model, while a linear regression might be underfitting it, what exactly should be used is case specific and varies from one application to the other.
Simple polynomial interpolation example:
Let’s say we have
(0,1),(1,2),(3,10)as our data.The table1 we get using newton method is:
Now, the polynom we get is the "diagonal" that ends with the last element:
(and that is a perfect fit indeed to the data we used)
(1) The table is recursively created: The first 2 columns are the x,y values – and each next column is based on the prior one. It is really easy to implement once you get it, the full explanation is in the wikipedia page for newton interpolation.