<< Chapter < Page Chapter >> Page >

When the a priori density of a parameter is not known or the parameter itself is inconveniently described asa random variable, techniques must be developed that make no presumption about the relative possibilities of parametervalues. Lacking this knowledge, we can expect the error characteristics of the resulting estimates to be worse thanthose which can use it.

The maximum likelihood estimate ML r of a nonrandom parameter is, simply, that value which maximizes the likelihood function (the a priori density of the observations). Assuming that the maximum can be found by evaluating a derivative, ML r is defined by

ML p r 0
The logarithm of the likelihood function may also be used in this maximization.

Let r l be a sequence of independent, identically distributed Gaussian random variables having an unknown mean but a known variance n 2 . Often, we cannot assign a probability density to a parameter of a random variable's density; we simply do not knowwhat the parameter's value is. Maximum likelihood estimates are often used in such problems. In the specific case here, thederivative of the logarithm of the likelihood function equals p r 1 n 2 l 0 L 1 r l The solution of this equation is the maximum likelihood estimate, which equals the sample average. ML 1 L l 0 L 1 r l The expected value of this estimate ML equals the actual value , showing that the maximum likelihood estimate is unbiased. Themean-squared error equals n 2 L and we infer that this estimate is consistent.

Parameter vectors

The maximum likelihood procedure (as well as the others being discussed) can be easily generalized to situations where morethan one parameter must be estimated. Letting denote the parameter vector, the likelihood function is now expressed as p r . The maximum likelihood estimate ML of the parameter vector is given by the location of the maximum of the likelihood function (or equivalently of itslogarithm). Using derivatives, the calculation of the maximum likelihood estimate becomes

ML p r 0
where denotes the gradient with respect to the parameter vector. This equation means that we must estimate all of theparameter simultaneously by setting the partial of the likelihood function with respect to each parameter to zero. Given P parameters, we must solve in most cases a set of P nonlinear, simultaneous equations to find the maximum likelihoodestimates.

Let's extend the previous example to the situation where neither the mean nor the variance of a sequence of independentGaussian random variables is known. The likelihood function is, in this case, p r l 0 L 1 1 2 2 1 2 2 r l 1 2 Evaluating the partial derivatives of the logarithm of this quantity, we find the following set of two equations to solvefor 1 , representing the mean, and 2 , representing the variance.

The variance rather than the standard deviation is represented by 2 . The mathematics is messier and the estimator has less attractive properties in the latter case. This problem illustrates this point.
1 2 l 0 L 1 r l 1 0 L 2 2 1 2 2 2 l 0 L 1 r l 1 2 0 The solution of this set of equations is easily found to be 1 ML 1 L l 0 L 1 r l 2 ML 1 L l 0 L 1 r l 1 ML 2

The expected value of 1 ML equals the actual value of 1 ; thus, this estimate is unbiased. However, the expected value of the estimate of the variance equals 2 L 1 L . The estimate of the variance is biased, but asymptotically unbiased. This bias can be removed by replacingthe normalization of L in the averaging computation for 2 ML by L 1 .

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Statistical signal processing. OpenStax CNX. Dec 05, 2011 Download for free at http://cnx.org/content/col11382/1.1
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Statistical signal processing' conversation and receive update notifications?

Ask