<< Chapter < Page | Chapter >> Page > |
Throughout this module, let $X$ denote the input to a decision-making process and $Y$ denote the correct response or output (e.g., the value of a parameter, the label of a class, the signal of interest). We assume that $X$ and $Y$ are random variables or random vectors with joint distribution ${P}_{X,Y}(x,y)$ , where $x$ and $y$ denote specific values that may be taken by the random variables $X$ and $Y$ , respectively. The observation $X$ is used to make decisions pertaining to the quantity of interest. For thepurposes of illustration, we will focus on the task of determining the value of the quantity of interest. A decision rule for this task is a function $f$ that takes the observation $X$ as input and outputs a prediction of the quantity $Y$ . We denote a decision rule by $\widehat{Y}$ or $f\left(X\right)$ , when we wish to indicate explicitly the dependence of the decision rule on the observation. Wewill examine techniques for designing decision rules and for analyzing their performance.
The accuracy of a decision is measured with a loss function. For example, if our goal is to determine the value of $Y$ , then a loss function takes as inputs the true value $Y$ and the predicted value (the decision) $\widehat{Y}=f\left(X\right)$ and outputs a non-negative real number (the “loss”) reflective of theaccuracy of the decision. Two of the most commonly encountered loss functions include:
The 0/1 loss is commonly used in detection and classification problems, and the squared error loss is more appropriate for problemsinvolving the estimation of a continuous parameter. Note that since the inputs to the loss function may be random variables, so is the loss.
A risk $R\left(f\right)$ is a function of the decision rule $f$ , and is defined to be the expectation of a loss with respect to the jointdistribution ${P}_{X,Y}(x,y)$ . For example, the expected 0/1 loss produces the probability of error risk function; i.e., a simply calculation shows that ${R}_{0/1}\left(f\right)=E[\left({\mathbf{I}}_{f\left(X\right)\ne Y}\right]=\text{Pr}(f\left(X\right)\ne Y)$ . The expected squared error loss produces the mean squared error MSE risk function, ${R}_{2}\left(f\right)={E[\parallel f\left(X\right)-Y\parallel}_{2}^{2}]$ .
Optimal decisions are obtained by choosing a decision rule $f$ that minimizes the desired risk function. Given complete knowledge of theprobability distributions involved (e.g., ${P}_{X,Y}(x,y)$ ) one can explicitly or numerically design an optimal decision rule, denoted ${f}^{*}$ , that minimizes the risk function.
The conditional distribution of the observation $X$ given the quantity of interest $Y$ is denoted by ${P}_{X|Y}\left(x\right|y)$ . The conditional distribution ${P}_{X|Y}\left(x\right|y)$ can be viewed as a generative model, probabilistically describing the observations resulting from a givenvalue, $y$ , of the quantity of interest. For example, if $y$ is the value of a parameter, the ${P}_{X|Y}\left(x\right|y)$ is the probability distribution of the observation $X$ when the parameter value is set to $y$ . If $X$ is a continuous random variable with conditional density ${p}_{X|Y}\left(x\right|y)$ or a discrete random variable with conditional probability mass function (pmf) ${p}_{X|Y}\left(x\right|y)$ , then given a value $y$ we can assess the probability of a particular measurment value $y$ by the magnitude of either the conditional density or pmf.
Notification Switch
Would you like to follow the 'Statistical learning theory' conversation and receive update notifications?