4.15 Partial knowledge of probability distributions

Statistical signal processing Page 1 / 1

In previous chapters, we assumed we knew the mathematical form of the probability distribution for the observations under eachmodel; some of these distribution's parameters were not known and we developed decision rules to deal with this uncertainty. Amore difficult problem occurs when the mathematical form is not known precisely. For example, the data may be approximatelyGaussian, containing slight departures from the ideal. More radically, so little may be known about an accurate model for the data that we are only willing to assume that they are distributed symmetricallyabout some value. We develop model evaluation algorithms in this section that tackle both kinds of problems. However, beforewarned that solutions to such general models come at a price: the more specific a model can be that accuratelydescribes a given problem, the better the performance. In other words, the more specific the model, the more the signalprocessing algorithms can be tailored to fit it with the obvious result that we enhance the performance. However, if our specificmodel is in error, our neatly tailored algorithms can lead us drastically astray. Thus, the best approach is to relax thoseaspects of the model which seem doubtful and to develop algorithms that will cope well with worst-case situations shouldthey arise ("And they usually do," echoes every person experienced in the vagaries of data). These considerations leadus to consider nonparametric variations in the probability densities compatible with out assessment of model accuracy and to derive decision rules that minimize the impact of the worse-case situation.

Worst-case probability distributions

In model evaluation problems, there are "optimally" hard problems, those where the models are the most difficult todistinguish. The impossible problem is to distinguish models that are identical. In this situation, the conditional densities of theobserved data are equal and the likelihood ratio is constant for all possible values of the observations. It is obvious thatidentical models are indistinguishable; this elaboration suggest that in terms of the likelihood ratio, hard problems are those in which the likelihood ratio is constant . Thus, "hard problems" are those in which the class of conditionalprobability densities has a constant ratio for wide ranges of observed data values.

The most relevant model evaluation problem for us is the discrimination between two models thatdiffer only in the means of statistically independent observations: the conditional densities of each observation arerelated as $p r_{l}_{1} r_{l} p r_{l}_{0} r_{l} m$ . Densities that would make this model evaluation problem hard would satisfy the functional equation $x m x m p x m C m p x$ where $C m$ is quantity depending on the mean $m$ , but not the variable $x$ .

The uniform density does not satisfy this equation as the domainof the function

p

is assumed to be infinite.

For the probability densities satisfying this equation, any value ofthe observed datum which has a value greater than

m

cannot be used to distinguish the two models. If one considers only thosezero-mean densities

p

which are symmetric about the origin, then by symmetry the likelihood ratio would also be constant for

x 0

. Hypotheses having these densities could only be distinguished when the oberservations lay in the interval

0 m

; such model evaluation problems are hard!

From the functional equation, we see that the quantity $C m$ must be inversely proportional to $p m$ (substitute $x m$ into the equation). Incorporating this fact into our functional equation, we find that the only solution is the exponential function. $z z 0 p z m C m p z p z z$ If we insist that the density satisfying the functional equation by symmetric, the solution is the so-calledLaplacian (or double-exponential) density. $p z z 122 z 22$ When this density serves as the underlying density for our hard model-testing problem, the likelihood ratio hasthe form ( Huber; 1965 , Huber; 1981 , Poor pp.175-187 ) $r_{l} m 22 r_{l} 0 2 r_{l} m 22 0 r_{l} m m 22 m r_{l}$ Indeed, the likelihood ratio is constant over much of the range of values of $r_{l}$ , implying that the two models are very similar over those ranges. This worst-case result will appearrepeatedly as we embark on searching for the model evaluation rules that minimize the effect of modeling errorson performance.

<< Chapter < Page Page > Chapter >>

Read also:

Get Jobilize Job Search Mobile App in your pocket Now!

100% Free Mobile Applications
Receive real-time job alerts and never miss the right job again

Source: OpenStax, Statistical signal processing. OpenStax CNX. Dec 05, 2011 Download for free at http://cnx.org/content/col11382/1.1

Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Statistical signal processing' conversation and receive update notifications?

Ask

	24 PLANT QUIZ 1 By Brooke Delaney Start Exam
	Subject-verb Agreement By Dindin Secreto Start Quiz
	2 BOD - Neuropathology By Brooke Delaney Start Exam
©flickr:	Health Reviewer MCQ By Edgar Delgado Start Quiz
	General Chemistry I MCQ By Joanna Smithback Start Quiz
	17 AP Key Terms 17 Endocrine System By OpenStax Start Key Terms
	12 Neuroanatomy 12 The Basal Ganglia By Stephen Voron Start Quiz
©flickr: U.S.	Biology Chapter 9 By Michael Sag Start Exam
	9 Sociology 09 Social Stratification in the US MCQ By OpenStax Start Quiz
	Real Estate Finance Exam 2003 By Tod McGrath Start Exam