<< Chapter < Page Chapter >> Page >
T ( y ) = y a ( η ) = - log ( 1 - Φ ) = log ( 1 + e η ) b ( y ) = 1

This shows that the Bernoulli distribution can be written in the form of Equation  [link] , using an appropriate choice of T , a and b .

Let's now move on to consider the Gaussian distribution. Recall that, when deriving linear regression, the value of σ 2 had no effect on our final choice of θ and h θ ( x ) . Thus, we can choose an arbitrary value for σ 2 without changing anything. To simplify the derivation below, let'sset σ 2 = 1 . If we leave σ 2 as a variable, the Gaussian distribution can also be shown to be in the exponential family, where η R 2 is now a 2-dimension vector that depends on both μ and σ . For the purposes of GLMs, however, the σ 2 parameter can also be treated by considering a more general definition of the exponential family: p ( y ; η , τ ) = b ( a , τ ) exp ( ( η T T ( y ) - a ( η ) ) / c ( τ ) ) . Here, τ is called the dispersion parameter , and for the Gaussian, c ( τ ) = σ 2 ; but given our simplification above, we won't need the more general definition for the examples we will consider here. We then have:

p ( y ; μ ) = 1 2 π exp - 1 2 ( y - μ ) 2 = 1 2 π exp - 1 2 y 2 · exp μ y - 1 2 μ 2

Thus, we see that the Gaussian is in the exponential family, with

η = μ T ( y ) = y a ( η ) = μ 2 / 2 = η 2 / 2 b ( y ) = ( 1 / 2 π ) exp ( - y 2 / 2 ) .

There're many other distributions that are members of the exponential family: The multinomial (which we'll see later), the Poisson (for modelling count-data;also see the problem set); the gamma and the exponential (for modelling continuous, non-negative random variables, such as time-intervals); the beta and the Dirichlet(for distributions over probabilities); and many more. In the next section, we will describe a general “recipe” for constructing models in which y (given x and θ ) comes from any of these distributions.

Constructing glms

Suppose you would like to build a model to estimate the number y of customers arriving in your store (or number of page-views on your website) in any givenhour, based on certain features x such as store promotions, recent advertising, weather, day-of-week, etc. We know that the Poisson distributionusually gives a good model for numbers of visitors. Knowing this, how can we come up with a model for our problem? Fortunately,the Poisson is an exponential family distribution, so we can apply a Generalized Linear Model (GLM).In this section, we will we will describe a method for constructing GLM models for problems such as these.

More generally, consider a classification or regression problem where we would like to predict the value of some random variable y as a function of x . To derive a GLM for this problem, we will make the following threeassumptions about the conditional distribution of y given x and about our model:

  1. y x ; θ ExponentialFamily ( η ) . I.e., given x and θ , the distribution of y follows some exponential family distribution, with parameter η .
  2. Given x , our goal is to predict the expected value of T ( y ) given x . In most of our examples, we will have T ( y ) = y , so this means we would like the prediction h ( x ) output by our learned hypothesis h to satisfy h ( x ) = E [ y | x ] . (Note that this assumption is satisfied in the choices for h θ ( x ) for both logistic regression and linear regression.For instance, in logistic regression, we had h θ ( x ) = p ( y = 1 | x ; θ ) = 0 · p ( y = 0 | x ; θ ) + 1 · p ( y = 1 | x ; θ ) = E [ y | x ; θ ] .)
  3. The natural parameter η and the inputs x are related linearly: η = θ T x . (Or, if η is vector-valued, then η i = θ i T x .)

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask