Machine learning lecture 1 course notes (Page 10/13)

Machine learning

Page 10 / 13

θ : = θ - H^{- 1} \nabla_{θ} ℓ (θ) .

Here, $\nabla_{θ} ℓ (θ)$ is, as usual, the vector of partial derivatives of $ℓ (θ)$ with respect to the $θ_{i}$ 's; and $H$ is an $n$ -by- $n$ matrix (actually, $n + 1$ -by- $n + 1$ , assuming that we include the intercept term) called the Hessian , whose entries are given by

H_{i j} = \frac{\partial^{2} ℓ (θ)}{\partial θ_{i} \partial θ_{j}} .

Newton's method typically enjoys faster convergence than (batch) gradient descent, and requires many fewer iterations to get very close to theminimum. One iteration of Newton's can, however, be more expensive than one iteration of gradient descent, since it requires finding andinverting an $n$ -by- $n$ Hessian; but so long as $n$ is not too large, it is usually much faster overall. When Newton's method is applied to maximize thelogistic regression log likelihood function $ℓ (θ)$ , the resulting method is also called Fisher scoring .

Generalized linear models The presentation of the material in this section takes inspiration from Michael I. Jordan, Learning in graphical models (unpublished book draft), and also McCullagh and Nelder, Generalized Linear Models (2nd ed.) .

So far, we've seen a regression example, and a classification example. In the regression example, we had $y | x; θ \sim N (μ, σ^{2})$ , and in the classification one, $y | x; θ \sim Bernoulli (Φ)$ , for some appropriate definitions of $μ$ and $Φ$ as functions of $x$ and $θ$ . In this section, we will show that both of these methods are special cases of a broader family of models, calledGeneralized Linear Models (GLMs). We will also show how other models in the GLM family can be derived and applied to other classificationand regression problems.

The exponential family

To work our way up to GLMs, we will begin by defining exponential family distributions. We say that a class of distributions is in the exponential family if it can be writtenin the form

p (y; η) = b (y) exp (η^{T} T (y) - a (η))

Here, $η$ is called the natural parameter (also called the canonical parameter ) of the distribution; $T (y)$ is the sufficient statistic (for the distributions we consider, it will often be the case that $T (y) = y$ ); and $a (η)$ is the log partition function . The quantity $e^{- a (η)}$ essentially plays the role of a normalization constant, that makes sure the distribution $p (y; η)$ sums/integrates over $y$ to 1.

A fixed choice of $T$ , $a$ and $b$ defines a family (or set) of distributions that is parameterized by $η$ ; as we vary $η$ , we then get different distributions within this family.

We now show that the Bernoulli and the Gaussian distributions are examples of exponential family distributions. The Bernoulli distribution with mean $Φ$ , written $Bernoulli (Φ)$ , specifies a distribution over $y \in {0, 1}$ , so that $p (y = 1; Φ) = Φ$ ; $p (y = 0; Φ) = 1 - Φ$ . As we vary $Φ$ , we obtain Bernoulli distributions with different means. We now show that this class of Bernoullidistributions, ones obtained by varying $Φ$ , is in the exponential family; i.e., that there is a choice of $T$ , $a$ and $b$ so that Equation [link] becomes exactly the class of Bernoulli distributions.

We write the Bernoulli distribution as:

\begin{matrix} p (y; Φ) & = & Φ^{y} {(1 - Φ)}^{1 - y} \\ = & exp (y log Φ + (1 - y) log (1 - Φ)) \\ = & exp ((log (\frac{Φ}{1 - Φ})) y + log (1 - Φ)) . \end{matrix}

Thus, the natural parameter is given by $η = log (Φ / (1 - Φ))$ . Interestingly, if we invert this definition for $η$ by solving for $Φ$ in terms of $η$ , we obtain $Φ = 1 / (1 + e^{- η})$ . This is the familiar sigmoid function! This will come up again when we derive logisticregression as a GLM. To complete the formulation of the Bernoulli distribution as an exponential familydistribution, we also have

<< Chapter < Page Page > Chapter >>

Read also:

Get Jobilize Job Search Mobile App in your pocket Now!

100% Free Mobile Applications
Receive real-time job alerts and never miss the right job again

Source: OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4

Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask

©flickr: Justin	The Last Holiday Concert Chapter 1 By Mackenzie Wilcox Start Quiz
	1 AP 01 Human Body Anatomy Physiology MCQ By OpenStax Start Quiz
	7 Neuroanatomy 07 The Visual System By Stephen Voron Start Quiz
	12 AP Key Terms 12 The Nervous System By OpenStax Start Key Terms
	Cultural Anthropology Assignment 2 By Richley Crapo Start Assignment
	Engineering Communication MCQ By Steve Gibbs Start Quiz
	38 Biology 38 The Musculoskeletal System MCQ By OpenStax Start Quiz
	Gastrointestinal Pathophysiology 2006 By Tamsin Knox Start Exam
	8 Arts Society: Theater 8 By Jonathan Long Start Quiz
	Business Statistics By David Bourgeois Start Quiz