<< Chapter < Page | Chapter >> Page > |
$$\forall n, n=\{1, , N\}\colon {x}_{n}=A+{W}_{n}$$ $$({W}_{n}, (0, ^{2}))$$ iid. $$(A)=\frac{1}{N}\sum_{n=1}^{N} {x}_{n}$$ MVUB and MLE estimator. Now suppose that we have prior knowledge that $-{A}_{0}\le A\le {A}_{0}$ . We might incorporate this by forming a new estimator
Let $p(a)$ denote the density of $(A)$ . Since $(A)=\frac{1}{N}\sum {x}_{n}$ , $p(a)=(A, \frac{^{2}}{N})$ . The density of $\stackrel{}{A}$ is given by
$$\forall n, n=\{1, , N\}\colon {x}_{n}=A+{W}_{n}$$
Prior distribution allows us to incorporate prior information regarding unknown paremter--probable values of parameter aresupported by prior. Basically, the prior reflects what we believe "Nature" will probably throw at us.
$$\forall , \in \left[0 , 1\right]\colon p(, x)=(n, x)^{x}(1-)^{(n-x)}$$ which is the Binomial likelihood. $$p()=\frac{1}{B(, )}^{(-1)}(1-)^{(-1)}$$ which is the Beta prior distriubtion and $B(, )=\frac{()()}{(+)}$
$$p(x,)=\frac{(n, x)}{B(, )}^{(+x-1)}(1-)^{(n-x+-1)}$$
$$p(x)=(n, x)\frac{(+)}{()()}\frac{(+x)(n-x+)}{(++n)}$$
$$p(x, )=\frac{^{(+x-1)}^{(+n-x-1)}}{B(+x, +n-x)}$$ where $B(+x, +n-x)$ is the Beta density with parameters ${}^{}=+x$ and ${}^{}=+n-x$
Clearly, the most important objective is to choose the prior $p()$ that best reflects the prior knowledge available to us. In general, however, our prior knowledge is imprecise andany number of prior densities may aptly capture this information. Moreover, usually the optimal estimator can't beobtained in closed-form.
Therefore, sometimes it is desirable to choose a prior density that models prior knowledge and is nicely matched in functional form to $p(, x)$ so that the optimal esitmator (and posterior density) can be expressed in a simple fashion.
Suppose we want to estimate the variance of a process, incorporating a prior that is amplitude-scaleinvariant (so that we are invariant to arbitrary amplitude rescaling of data). $p(s)=\frac{1}{s}$ satisifies this condition. $(^{2}, p(s))\implies (A^{2}, p(s))$ where $p(s)$ is non-informative since it is invariant to amplitude-scale.
Given $p(, x)$ , choose $p()$ so that $(p(x, ), p(, x)p())$ has a simple functional form.
Choose
$p()\in $ , where
$$ is a family of densities (
$$\forall n, n=\{1, , N\}\colon {x}_{n}=A+{W}_{n}$$ $$({W}_{n}, (0, ^{2}))$$ iid. Rather than modeling $(A, U(-{A}_{0}, {A}_{0}))$ (which did not yield a closed-form estimator) consider $$p(A)=\frac{1}{\sqrt{2\pi {}_{A}^{2}}}e^{\frac{-1}{2{}_{A}^{2}}(A-)^{2}}$$
With $=0$ and ${}_{A}=\frac{1}{3}{A}_{0}$ this Gaussian prior also reflects prior knowledge that it is unlikely for $\left|A\right|\ge {A}_{0}$ .The Gaussian prior is also conjugate to the Gaussian likelihood $$p(A, x)=\frac{1}{(2\pi ^{2})^{\left(\frac{N}{2}\right)}}e^{\frac{-1}{2^{2}}\sum_{n=1}^{N} ({x}_{n}-A)^{2}}$$ so that the resulting posterior density is also a simple Gaussian, as shown next.
First note that $$p(A, x)=\frac{1}{(2\pi ^{2})^{\left(\frac{N}{2}\right)}}e^{\frac{-1}{2^{2}}\sum_{n=1}^{N} {x}_{n}}e^{\frac{-1}{2^{2}}(NA^{2}-2NA\stackrel{-}{x})}$$ where $\stackrel{-}{x}=\frac{1}{N}\sum_{n=1}^{N} {x}_{n}$ .
Small $N\to (A)$ favors prior.
Large $N\to (A)$ favors data.
The multivariate Gaussian model is the most important Bayesian tool in signal processing. It leads directly tothe celebrated Wiener and Kalman filters.
Assume that we are dealing with random vectors $x$ and $y$ . We will regard $y$ as a signal vector that is to be estimated from an observation vector $x$ .
$y$ plays the same role as $$ did in earlier discussions. We will assume that $y$ is p1 and $x$ is N1. Furthermore, assume that $x$ and $y$ are jointly Gaussian distributed $$(\begin{pmatrix}x\\ y\\ \end{pmatrix}, (\begin{pmatrix}0\\ 0\\ \end{pmatrix}, \begin{pmatrix}{R}_{\mathrm{xx}} & {R}_{\mathrm{xy}}\\ {R}_{\mathrm{yx}} & {R}_{\mathrm{yy}}\\ \end{pmatrix}))$$ $(x)=0$ , $(y)=0$ , $(xx^T)={R}_{\mathrm{xx}}$ , $(xy^T)={R}_{\mathrm{xy}}$ , $(yx^T)={R}_{\mathrm{yx}}$ , $(yy^T)={R}_{\mathrm{yy}}$ . $$R\equiv \begin{pmatrix}{R}_{\mathrm{xx}} & {R}_{\mathrm{xy}}\\ {R}_{\mathrm{yx}} & {R}_{\mathrm{yy}}\\ \end{pmatrix}$$
$x=y+W$ , $(W, (0, ^{2}I))$ $$p(y)=(0, {R}_{\mathrm{yy}})$$ which is independent of $W$ . $(x)=(y)+(W)=0$ , $(xx^T)=(yy^T)+(yW^T)+(Wy^T)+(WW^T)={R}_{\mathrm{yy}}+^{2}I$ , $(xy^T)=(yy^T)+(Wy^T)={R}_{\mathrm{yy}}=(yx^T)$ . $$(\begin{pmatrix}x\\ y\\ \end{pmatrix}, (\begin{pmatrix}0\\ 0\\ \end{pmatrix}, \begin{pmatrix}{R}_{\mathrm{yy}}+^{2}I & {R}_{\mathrm{yy}}\\ {R}_{\mathrm{yy}} & {R}_{\mathrm{yy}}\\ \end{pmatrix}))$$ From our Bayesian perpsective, we are interested in $p(x, y)$ .
Notification Switch
Would you like to follow the 'Statistical signal processing' conversation and receive update notifications?