<< Chapter < Page | Chapter >> Page > |
That’s the [inaudible] but, yes. You are assuming that the error has zero mean. Which is, yeah, right. I think later this quarter we get to some of the other things, but for now just think of this as a mathematically – it’s actually not an unreasonable assumption. I guess, in machine learning all the assumptions we make are almost never true in the absence sense,right? Because, for instance, housing prices are priced to dollars and cents, so the error will be – errors in prices are not continued as value random variables, because houses can only be priced at a certain number of dollars and a certain number of cents and you never have fractions of cents in housing prices. Whereas a Gaussian random variable would. So in that sense, assumptions we make are never “absolutely true,” but for practical purposes this is a accurate enough assumption that it’ll be useful to make. Okay? I think in a week or two, we’ll actually come back to selected more about the assumptions we make and when they help our learning algorithms and when they hurt our learning algorithms. We’ll say a bit more about it when we talk about generative and discriminative learning algorithms, like, in a week or two. Okay?
So let’s point out one bit of notation, which is that when I wrote this down I actually wrote P of YI given XI and then semicolon theta and I’m going to use this notation when we are not thinking of theta as a random variable. So in statistics, though, sometimes it’s called the frequentist’s point of view, where you think of there as being some, sort of, true value of theta that’s out there that’s generating the data say, but we don’t know what theta is, but theta is not a random vehicle, right? So it’s not like there’s some random value of theta out there. It’s that theta is – there’s some true value of theta out there. It’s just that we don’t know what the true value of theta is. So if theta is not a random variable, then I’m going to avoid writing P of YI given XI comma theta, because this would mean that probably of YI conditioned on X and theta and you can only condition on random variables. So at this part of the class where we’re taking sort of frequentist’s viewpoint rather than the Dasian viewpoint, in this part of class we’re thinking of theta not as a random variable, but just as something we’re trying to estimate and use the semicolon notation. So the way to read this is this is the probability of YI given XI and parameterized by theta. Okay? So you read the semicolon as parameterized by. And in the same way here, I’ll say YI given XI parameterized by theta is distributed Gaussian with that.
All right. So we’re gonna make one more assumption. Let’s assume that the error terms are IID, okay? Which stands for Independently and Identically Distributed. So it’s going to assume that the error terms are independent of each other, right? The identically distributed part just means that I’m assuming the outcome for the same Gaussian distribution or the same variance, but the more important part of is this is that I’m assuming that the epsilon I’s are independent of each other. Now, let’s talk about how to fit a model. The probability of Y given X parameterized by theta – I’m actually going to give this another name. I’m going to write this down and we’ll call this the likelihood of theta as the probability of Y given X parameterized by theta. And so this is going to be the product over my training set like that. Which is, in turn, going to be a product of those Gaussian densities that I wrote down just now, right? Okay?
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?