<< Chapter < Page Chapter >> Page >

The CRLB states that under some mild regularity assumptions about the conditional density function p X | Y ( x | y ) , the variance of any unbiased estimator is bounded from below by the inverse of the I ( y * ) [link] , [link] , [link] . Recall that an unbiased estimator is any estimator Y ^ that satisfies E [ Y ^ ] = y * . The CRLB tells us is that

var ( Y ^ ) 1 I ( y * ) .

If Y is a vector-valued quantity, then the expected negative Hessian matrix(matrix of partial second derivatives) of the log-likelihood function is called the Fisher Information Matrix (FIM), and a similar inequality tells us that the varianceof each component of any unbiased estimator of y * is bounded below by the corresponding diagonal element of the inverse of the FIM.Since the MSE of an unbiased estimator is equal to its variance, we see that the CRLB provides a very useful lower bound on the best MSEperformance that we can hope to achieve. Thus, the CRLB is often used as a comparison point for evaluating estimators. It may or may not bepossible to achieve the CRLB, but if we find a decision rule that does, we know that it also minimizes the MSE risk among all possibleunbiased estimators. In general, it may be difficult to compute the CRLB, but in certain important cases it is possible to findclosed-form or computational solutions.

Bayesian decision theory

Bayesian Decision Theory provides a formal system for integrating prior knowledge and observed observations. Forthe purposes of illustration we will focus on problems involving continuous variables and observations, but extensions to discretecases are straightforward (simple replace probability densities with probability mass functions, and integrals with summations). The keyelements of Bayesian methods are:

  1. a prior probability density function p Y ( y ) describing a priori knowledge of probable states for the quantity Y ;
  2. the likelihood function p X | Y ( x | y ) , as described above;
  3. the posterior density function p Y | X ( y | x ) .

The posterior density is a function of the prior and likelihood, obtained according to Bayes rule:

p Y | X ( y | x ) = p X | Y ( x | y ) p Y ( y ) p X | Y ( x | y ) p Y ( y ) d y .

The posterior is an indicator of probable values for Y , based on the prior knowledge and the observation. Several options exist for deriving a specific estimateof Y using the posterior. The mean value of the posterior density is one common choice (commonly called the posterior mean ). The posterior mean is the decision rule that minimizes the expectedsquared error loss (MSE risk) function. The value y where the posterior density is maximized is another popular estimator (commonlycalled the Maximum A Posteriori (MAP) estimator). Note that the denominator of the posterior is independent of y , so the MAP estimator is simply the maximizer of the product of the likelihood andthe prior. Therefore, if the prior is a constant function, the MAP estimator and MLE coincide.

Statistical learning

In all of the methods described above, we assumed some amount of knowledge about the distributions of the observation X and quantity of interest Y . Such knowledge can come from a careful analysis of the physical characteristics of the problem at hand, or it can begleaned from previous experience. However, there are situations where it is difficult to model the physics of the problem and we may nothave enough experience to develop complete and accurate probability models. In such cases, it is natural to adopt a statistical learning approach [link] , [link] .

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Statistical learning theory. OpenStax CNX. Apr 10, 2009 Download for free at http://cnx.org/content/col10532/1.3
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Statistical learning theory' conversation and receive update notifications?

Ask