<< Chapter < Page | Chapter >> Page > |
Consider the linear equation $Ax=b$ , where $A\in {\mathbb{R}}^{m\times n},b\in {\mathbb{R}}^{m\times 1},x\in {\mathbb{R}}^{n\times 1},\phantom{\rule{1.em}{0ex}}m>n$ .
This equation is overdetermined and has no precise answer. The simplest approach to finding x is a least-squares fitting model, which finds the curve with the least difference between the value of the curve at a point and the value of the data at that point; i.e., it solves ${min}_{x\in {\mathbb{R}}^{n}}{\parallel Ax-b\parallel}^{2}$ . This amounts to saying that the data may be slightly perturbed:
where r is some residual noise, and minimizing $\parallel r\parallel $ :
When we compare this to our equation $Bk=f$ , we see that this is an appropriate method: we are not entirely confident of f , and can perturb it slightly.
Looking more closely, $B={A}^{T}diag\left(A,x\right)$ . We are also not entirely certain of x , which means we are not entirely certain of ${A}^{T}diag\left(A,x\right)$ . This is best reflected in the total least squares approach, in which both the data ( b in the simple equation, f in our equation) and the matrix ( A in the simple equation, ${A}^{T}diag\left(A,x\right)$ in our equation) may be slightly perturbed:
where E is some noise in A and r is some noise in b , and minimizing $\parallel \left[E,\phantom{\rule{0.277778em}{0ex}},r\right]\parallel $ :
The last term in the singular value decomposition of $\left[A\phantom{\rule{0.277778em}{0ex}}b\right]$ , $-{s}_{n+1}{u}_{n+1}{v}_{n+1}^{T}$ , is precisely what we want for $\left[E\phantom{\rule{0.277778em}{0ex}}r\right]$ .
At first glance, this exactly what we want. We can find the singular value decomposition of $\left[B\phantom{\rule{0.277778em}{0ex}}f\right]$ , take the last term as $\left[E\phantom{\rule{0.277778em}{0ex}}r\right]$ , and solve for k . When we implement this method, however, we get worse results compared to the measured data. Standard least squares returns a k with only 182.04% percent error (See [link] ); total least squares returns a k with 269.17% percent error (See [link] ). Looking at the structure of $B={A}^{T}diag\left(A,x\right)$ and E gives a hint as to why. The adjacency matrix, A , encodes information about the structure of the network, so it has a very specific pattern of zeros, which is reflected in B . There are no similar restrictions on E , allowing zeros in inappropriate places. This is physically equivalent to sprouting a new spring between two nodes, an absurdity. [link] below compares the structure of B ( [link] ) and E ( [link] ). Light green entires correspond to a zero; everything else corresponds to a nonzero entry. E has many non-zero entries where there should not be any. Note the scale for the colorbar on the right: the entries of E are two orders of magnitude smaller than the entries in B . Though they are small, they represent connections between nodes and springs that do not exist, throwing off the entire result. Requiring that particular entries equal zero makes the problem combinatorally harder.
Because we would like to use statistical inference, it is important to have a basic understanding of several statistical concepts.
Definition 1 Probability Space
A space, Ω , of all possible events, $\omega \in \Omega $
Example
Rolling a die is an event.
Flipping a coin is an event.
Loading forces onto the spring network is an event.
Definition 2 Random Variable
A mapping from a space of events into the real line, $X:\Omega \to \mathbb{R}$ , or real n -dimensional space, $X:\Omega \to {\mathbb{R}}^{n}$
Notification Switch
Would you like to follow the 'The art of the pfug' conversation and receive update notifications?