Basic elements of statistical decision theory and statistical (Page 4/5)

Statistical learning theory Page 4 / 5

Statistical learning methods are based on developing decision rules or estimators based only on a collection of training examples, ratherthan predetermined probability models. Statistical learning methods are often said to be distribution-free , since they do not assume particular probability models. The canonical set-up for statisticallearning is as follows. We begin with a collection of training examples, ${(X_{i}, Y_{i})}_{i = 1}^{n}$ , which are assumed to be independently and identically distributed according to an unknown probability distribution $P_{X, Y} (x, y)$ . If we knew $P_{X, Y} (x, y)$ , then we could compute a desired risk function and design an optimal decision rule using the methods described above. Inessence, the training examples give us a glimpse at the underlying distribution, but our knowledge of it is far from complete. We cannotexactly compute a risk function, and therefore we cannot derive a corresponding optimal decision rule.

There are at least two ways to proceed at this point. One possibility is to use the training examples to estimate the joint probabilitydistribution, and then use this estimate to derive an decision rule. Unfortunately, the (general-purpose) problem of estimating adistribution is often more difficult from a limited pool of data than is the problem of designing a specific-purpose decision rule. Forthis reason, a second possibility is more commonly favored in practice. Rather than estimating the complete distribution, one canuse the training examples to directly design a decision rule. More precisely, perhaps the most common approach is to use the trainingexamples to compute an estimate of the desired risk function.

Suppose that we are interested in minimizing a particular risk function. Recall that the risk is the expected value of a chosen lossfunction. Let $ℓ (\hat{Y}, Y)$ denote the loss, and let $f (X)$ denote a candidate decision function, mapping observations to predictions about $Y$ (i.e., $\hat{Y} = f (X)$ ). The empirical risk function is constructed from the training examples as follows:

\hat{R} (f) = \frac{1}{n} \sum_{i = 1}^{n} ℓ (f (X_{i}), Y_{i}) .

This is simply the average loss of the decision rule $f$ over the set of training examples. Note that since the training examples areindependent and identically distributed, the expected value of the empirical risk is equal to the true risk $R (f) = E [ℓ (f (X), Y)]$ . Moreover, we known (according to the law of large numbers) that theempirical risk tends to the true risk as the size of the training sample increases. These facts lend support to the idea of choosing adecision rule to minimize the empirical risk.

Empirical risk minimization (ERM) is just this process. Given a collection of possible decision rules, say $F$ , ERM selects a decision rule according to

{\hat{f}}_{n} = arg min_{f \in F} \hat{R} (f) .

The selected rule, ${\hat{f}}_{n}$ , obviously depends on the given set of training examples, and therefore it is itself a random quantity. Thetheoretically optimal counterpart to ${\hat{f}}_{n}$ is the decision rule that minimizes the true risk

f^{*} = arg min_{f \in F} R (f) .

The central problem in statistical learning is to quantify how close ${\hat{f}}_{n}$ performs relative to $f^{*}$ . Note that $R (f^{*}) \leq R ({\hat{f}}_{n})$ , since $f^{*}$ minimizes the true risk. Thus, one way to gauge the performance of ${\hat{f}}_{n}$ relative to $f^{*}$ is to show that there exists small positive values $ϵ$ and $δ$ such that with probability at least $1 - δ$ we have

<< Chapter < Page Page > Chapter >>

Read also:

Get Jobilize Job Search Mobile App in your pocket Now!

100% Free Mobile Applications
Receive real-time job alerts and never miss the right job again

Source: OpenStax, Statistical learning theory. OpenStax CNX. Apr 10, 2009 Download for free at http://cnx.org/content/col10532/1.3

Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Statistical learning theory' conversation and receive update notifications?

Ask

	9 Arts Society: Theater 9 Exam By Jonathan Long Start Exam
	Clinical Issues in TB Management By Cath Yu Start Quiz
©flickr: Abraham	Biology Exam 3 By Vanessa Soledad Start Exam
	Principles of Marketing By Dionne Mahaffey Start Quiz
	NCE Ch 05 Theories in Counseling Helping... By Anh Dao Start Exam
	19 Biology 19 The Evolution of Populations MCQ By OpenStax Start Quiz
	Chemistry Final By Briana Hamilton Start Flashcards
©flickr: Alexander	Mechanics I MCQ By Stephanie Redfern Start Quiz
	1 Gastrointestinal Pathophysiology By Laurence Bailen Start Exam
	15 AP Key Terms 15 Autonomic Nervous System By OpenStax Start Key Terms