Data Analysis and Fitting

George E. Hrabovsky

MAST

Introduction

In the last lesson we discussed how to use some basic mathematical techniques in Mathematica that are relevant to computational science. In this lesson we will use those techniques as a foundation and study how to use a computer to analyze data, specifically how to use Mathematica to analyze data. This is a huge subject, worthy of a course in itself, so we will only study those techniques that lead to other computational techniques that we can use to make models down the road.

What we mean by data is a set of results distributed with respect to some dependent variable whose dependency we may not already know. Say for example that we have a set of n values, each indexed from 1 to n, l2_1.png. We think of these values as the result of some unknown function, f, of values—also indexed from 1 to n, l2_2.png. We can write this,

l2_3.png

(2.1)

The task of data analysis is to try to figure out what f is by examining the data. Such a result is often called an empirical formula. Many physical laws begin as this kind of empirical formula.

One question is, “Can we apply these techniques to computational modeling?” The answer is yes. Say you have a numerical result that constitutes a list of ordered pairs. Using these techniques you might estimate an empirical formula to describe them.

Matrix Operations

One way of thinking about data is as a list. A list is little more than matrix. So we can think of a set of data as a kind of matrix. Even if it is only a column or row matrix. We can give an example, let’s say that we have determined that our function is a polynomial,

l2_4.png

(2.2)

To truly solve this system for n values of x, we can write this as a matrix applied to a vector (the coefficients) to find the solution vector.

l2_5.png

(2.3)

As an exercise, rewrite this for arbitrary functions of l2_6.png, where we can assume arbitrary fitting functions for each value.

How do we solve this in Mathematica?

l2_7.png

l2_8.png

l2_9.png

We can write the solution for the coefficients in order.

l2_10.png

l2_11.png

What we call sol1 is what we had earlier called cvec,  the first solution is l2_12.png, the second l2_13.png, and the third l2_14.png.

We can rename this matrix mat1 as S. We can call cvec by the symbol c, and yvec as y. We then rewrite (2.3),

l2_15.png

(2.4)

Matrix multiplication is accomplished by using the dot symbol.

l2_16.png

l2_17.png

Matrix transpose is also easily done.

l2_18.png

l2_19.png

Determinants are also easily found.

l2_20.png

l2_21.png

It is often useful to factor a square matrix into products of matrices determined by the eigenvalues of the square matrix. This is called a decomposition. There are many such different types of decomposition.

The first method we use decomposes the matrix into the product of a unitary matrix called Q and an upper triangular matrix called R, we call this the QR Decomposition. This only works for numerical matrices.

l2_22.png

l2_23.png

l2_24.png

l2_25.png

l2_26.png

l2_27.png

l2_28.png

The first matrix is Q, the second R.

Another decomposition converts the matrix into a product of lower and upper triangular matrices, called the LU decomposition. In Mathematica the first result is the LU decomposition, the second is a vector that specifies the rows for pivoting, and the third result is the condition number for a numerical matrix.

l2_29.png

l2_30.png

l2_31.png

l2_32.png

l2_33.png

l2_34.png

l2_35.png

l2_36.png

A decomposition resulting in a conjugate transpose and a diagonal matrix is called the singular value decomposition, SVD.

l2_37.png

l2_38.png

l2_39.png

l2_40.png

These three matrices may be labeled u (an n×n orthogonal matrix), v (an n×m diagonal matrix), and v (an m×m orthogonal matrix). The decomposition is then,

l2_41.png

(2.5)

When we solve a matrix equation we invert the matrix. This is fine so long as the matrix is not singular. If the determinant of the matrix is zero, then it is said to be singular. A singular matrix cannot be inverted. A robust method for finding the pseudoinverse is called the Moore-Penrose method, and this even applies to singular matrices.

l2_42.png

l2_43.png

Since mat2 is nonsingular the inverse can be found.

l2_44.png

l2_45.png

l2_46.png

l2_47.png

l2_48.png

l2_49.png

Exact Fitting

If we have perfect information about the data points, then we can fit the data points exactly. In that case we write an equation similar to (2.4), we can solve for the coefficients,

l2_50.png

(2.6)

By applying the inverse of the transformation matrix to the vector of the data we want to fit.

For example, we have a set of data points.

l2_51.png

We can see these points.

l2_52.png

l2_53.gif

We have six points, so let’s try to fit a set of polynomials to the points.

l2_54.png

l2_55.png

l2_56.png

l2_57.png

l2_58.png

l2_59.png

l2_60.png

l2_61.png

l2_62.png

l2_63.png

We can see which of these is the best fit.

l2_64.png

l2_65.gif

We can see that the 5th fit is the best, connecting every point.

l2_66.png

l2_67.gif

Approximate Fitting

It is almost impossible to have exact information about measured data. All measurements have errors. Sometimes due to noise, sometimes due to systematic uncertainties. So how do we deal with this eventuality? We can use the same sort of method, though with a different formulation. In this case we still have the n data points, but with m terms.

l2_68.png

(2.7)

The solution will be of the form,

l2_69.png

(2.8)

Here the S matrix is an m×n rectangular matrix instead of the square matrix we had before. The problem is that we cannot invert such a matrix. Of course, we are left with the fact that we cannot produce an exact fit to this data. Instead we need to discover the function that gives the best fit of the data. We call these functions basis functions.

This best fit is best determined by minimizing the distance between a data point and the curve of the fitted function. Another way to look at this allows us to write a data point as an ordered pair l2_70.png, the corresponding function value is l2_71.png. The square of the distance between these is then l2_72.png. We call the sum of these distances the residual. We denote this with the Greek letter χ.

l2_73.png

(2.9)

The smaller this number is, the better the fit. We call this the linear least squares. More generally, given some matrix m and some argument b (this can be a vector or a matrix), we have the matrix equation

l2_74.png

(2.10)

The method of least squares finds the value for x that minimizes the norm {|m x - b|}. In Mathematica we can use the LeastSquares command.

l2_75.png

l2_76.png

l2_77.png

l2_78.png

We can check this.

l2_79.png

l2_80.png

This function has existed in Mathematica since version 6, but in the new release (version 13) you may choose one of several methods.

l2_81.png

l2_82.png

This is for dense or sparse matrices.

l2_83.png

l2_84.png

This is good for dense matrices.

l2_85.png

l2_86.png

This is good for dense or sparse machine number matrices.

l2_87.png

l2_88.png

This is good for sparse machine number matrices.

The problem of applying the least squares method to data is to discover the coefficients that minimize the residual. This is the purpose of the Moore-Penrose pseudoinverse.

If we apply the pseudoinverse to the transformation matrix by matrix multiplication we get an identity matrix. Again we have l2_89.png, where c is the solution of the least squares problem.

Here we see an example of this. Say we have a set of points.

l2_90.png

l2_91.png

We can plot these.

l2_92.png

l2_93.gif

We can get the data points without the uncertainties for fitting.

l2_94.png

l2_95.png

]

l2_96.png

l2_97.png

We can attempt to fit this set of data with various polynomial fits.

l2_98.png

l2_99.png

l2_100.png

l2_101.png

l2_102.png

l2_103.png

l2_104.png

l2_105.png

l2_106.png

l2_107.png

l2_108.png

l2_109.png

l2_110.png

l2_111.gif

None of them are doing particularly well. If this were a fit to a set of models, few of them hit the error bars and none are very good. The best is the fifth-order polynomial fit.

We can try other functions.

l2_112.png

l2_113.png

l2_114.png

l2_115.gif

As an exercise look up the FindFit, LeastSquares (particularly the Applications section) command. Try applying these functions to the data given above.

There is another effective way to smooth out these fits. We introduce a measure of the roughness of the residual, and we want to minimize this roughness. This roughness represents the lack of smoothness in the fitted curve. What constitutes the roughness? It can be pretty much anything in the form of the product of a matrix and the fit coefficients. One basic assumption is homogeneity. If R represents the roughness matrix and c is the coefficient vector, then we can write the homogeneity condition.

l2_116.png

(2.11)

Here we note that R is an n r×m matrix, r being the number of specific roughness constraints. We can assume that this equation can’t be satisfied (a smooth function has no roughness and can’t fit the data). On the other hand we seek to minimize the square of the roughness, along similar lines to the linear least squares method. We expect that this would look something like l2_117.png. We call this regularization.We then construct a compound matrix system. If we adopt the symbol λ as a control of the weight of the smoothing, then first n rows are S, the next n r rows are λ R. Thus we rewrite (2.10)

l2_118.png

(2.12)

We can solve this for c, thus finding the set of coefficients that fits the best. The total residual of all of this

l2_119.png

(2.13)

is minimized.

A useful idea is to make the roughness function be represented by the second derivative of the function at specific points l2_120.png,

l2_121.png

(2.14)

This is equivalent to something that we could call a smoothing spline.

We can test this out. Beginning with version 12 Mathematica has the ability to perform regularization.

l2_122.png

l2_123.png

Here we have chosen the second-order difference to represent the second derivative roughness function. Since the vertical scale is a factor of 1, we have chosen λ to be 1.

l2_124.png

l2_125.gif

What happens if we choose a different λ?

l2_126.png

l2_127.png

l2_128.gif

There is a little bit of variance, but not too much.

We can apply this regularization to solving equations for rough functions, too.

Interpolation

It is possible to approximate the function describing a set of data by defining a smooth curve that connects all of its points. This is called interpolation. In MMA we use the Interpolation command.

l2_129.png

l2_130.png

For example.

l2_131.png

l2_132.gif

We can see a plot of this.

l2_133.png

l2_134.png

l2_135.gif

We can use the interpolated function just like it is any kind of function, though it may not be possible to get a formula for it. It is an entirely numerical approximation of a function.

Image Reconstruction

If we were to shoot a beam of x-rays through an object we could measure the attenuation of the beam and get an estimate of the density through the material along the path of the beam. If we do this many times along chords in a plane section of the material, then we can integrate our density estimates to reconstruct the material along that plane section. This is a process called x-ray tomography.

Given a set of coefficients l2_136.png and a set of basis functions l2_137.png we can represent the density in this form,

l2_138.png

(2.15)

The basis functions can be as simple as a set of pixels covering the plane area. They can be very complicated with the requirement of a second index for a second dimension in the plane.

Each measurement chord passes through the basis functions. Thus, for each set of coefficients we get a chordal value of our measurement along each line of sight l2_139.png,

l2_140.png

(2.16)

This allows us to write our fitting problem in a standard form.

l2_141.png

(2.17)

We can solve this problem by the pseudoinverse so long as it is overdetermined by a large number of basis functions.

In practice this is almost never the case. For a variety of reasons the system is usually underdetermined (there might not be enough independent chordal measurements to determine the density of each pixel). Why? The noise inherent in the chordal measurements is amplified by the inversion process, so you get unwanted artifacts.

The solution is smoothing by regularization. The roughness function is a Laplacian that is evaluated at each pixel.

Created with the Wolfram Language