# 1.1 The maximum likelihood estimation method  (Page 2/2)

 Page 2 / 2
$f\left({x}_{1},{x}_{2},\dots ,{x}_{n}\right)=f\left({x}_{1}\right)f\left({x}_{2}\right)\cdots f\left({x}_{n}\right).$

The pdf of the joint distribution shown in (1) is known as the likelihood function . If the sample were not independently drawn, the pdf of joint distribution could not be written in such a simple form because of the covariance among the members of the sample would not be equal to zero . The logarithm of this function (or as it is referred to, the log of the likelihood function) is given by the sum $L\left({x}_{1},{x}_{2},\dots ,{x}_{n}\right)=\mathrm{ln}f\left({x}_{1}\right)+\mathrm{ln}f\left({x}_{2}\right)+\cdots +\mathrm{ln}f\left({x}_{n}\right)=\sum _{i=1}^{n}\mathrm{ln}f\left({x}_{i}\right).$ The maximum likelihood method involves choosing as estimators of the unknown parameters of the distribution the values that maximize the likelihood function. However, because the logarithm is a monotonically increasing function The function $g\left(y\right)$ is monotonically increasing for y if ${g}^{\prime }\left(y\right)>0.$ Because $\frac{d}{dx}\mathrm{ln}x=\frac{1}{x}>0\text{for}x>0,$ the logarithm function is monotonically increasing for positive values of $x.$ , maximizing the log of the likelihood function is equivalent to maximizing the likelihood function. The following example of this procedure illustrates how to derive ML estimators.

## The ml estimator of the population mean and population variance.

Assume that $x~N\left(\mu ,{\sigma }^{2}\right).$ Consider a sample of size n drawn independently from this distribution. The likelihood function is the product of the pdf of each observation or:

$f\left({x}_{i}\right)=\frac{1}{\sigma \sqrt{2\pi }}{e}^{-\frac{{\left({x}_{i}-\mu \right)}^{2}}{2{\sigma }^{2}}}⇒L\left({x}_{1},{x}_{2},\dots ,{x}_{n}\right)=\frac{1}{{\sigma }^{n}{\left(2\pi \right)}^{\frac{n}{2}}}{e}^{-\frac{\sum _{i=1}^{n}{\left({x}_{i}-\mu \right)}^{2}}{2{\sigma }^{2}}}.$

Thus, the log of the likelihood function of this sample is $L\left({x}_{1},{x}_{2},\dots ,{x}_{n}\right)=-\frac{n\mathrm{ln}2\pi }{2}-n\mathrm{ln}\sigma -\frac{\sum _{i=1}^{n}{\left({x}_{i}-\mu \right)}^{2}}{2{\sigma }^{2}}.$ In the ML method we want to find the estimators of the mean and variance, $\stackrel{⌢}{\mu }$ and $\stackrel{⌢}{\sigma }$ , that maximize the log of the likelihood function. Substituting in the parameter estimates into the log of the likelihood function gives our problem as:

$\underset{\stackrel{⌢}{\mu },\stackrel{⌢}{\sigma }}{Max}L\left({x}_{1},{x}_{2},\dots ,{x}_{n}\right)=\underset{\stackrel{⌢}{\mu },\stackrel{⌢}{\sigma }}{Max}\left[-\frac{n\mathrm{ln}2\pi }{2}-n\mathrm{ln}\stackrel{⌢}{\sigma }-\frac{\sum {\left({x}_{i}-\stackrel{⌢}{\mu }\right)}^{2}}{2{\stackrel{⌢}{\sigma }}^{2}}\right].$

Setting the derivatives of the log of the likelihood function with respect to $\stackrel{⌢}{\mu }$ and $\stackrel{⌢}{\sigma }$ equal to 0 gives:

$\frac{\partial L\left({x}_{1},{x}_{2},\dots ,{x}_{n}\right)}{\partial \stackrel{⌢}{\sigma }}=-\frac{n}{\stackrel{⌢}{\sigma }}+\frac{\sum {\left({x}_{i}-\stackrel{⌢}{\mu }\right)}^{2}}{{\stackrel{⌢}{\sigma }}^{3}}=0.$

Solving these two equations simultaneously gives:

Notice the fact that the estimator of the population mean is equal to the sample mean, a result that is the same as the one you found in your introductory statistics course. However, the unbiased estimator of the population variance used in that course is ${s}^{2}=\frac{\sum {\left({x}_{i}-\stackrel{⌢}{\mu }\right)}^{2}}{n-1}.$

Thus, one of the common "problems" with using a ML estimator is that quite often they are biased estimators of a population parameter. On the other hand, under very general conditions ML estimators are consistent , are asymptotically efficient , and have an asymptotically normal distribution (these are desirable large sample size characteristics of potential estimators and are discussed in advanced statistics courses). Intuitively, what these concepts mean is that as the sample size increases the estimator becomes more precise (the variance becomes smaller and an bias disappears) and the distribution of the estimator approaches the normal distribution. The formal definitions of these terms involve advanced statistical concepts that are reported here only in the interest of completeness. An estimator $\left(\stackrel{^}{\theta }\right)$ of the parameter $\theta$ is consistent if and only if $p\mathrm{lim}\stackrel{^}{\theta }=\theta .$ This estimator has an asymptotically normal distribution if $\stackrel{^}{\theta }\stackrel{a}{\to }N\left(\theta ,{\left\{I\left(\theta \right)\right\}}^{-1}\right).$ An unbiased estimator is more efficient that another unbiased estimator if it has a smaller variance than the alternative estimator. An asymptotically efficient is an estimator whose mean square error tends to zero as the sample size increases. The mean square error (MSE) is defined to be $MSE\left(\stackrel{^}{\theta }\right)=E\left[{\left(\stackrel{^}{\theta }-\theta \right)}^{2}\right]=V\left(\stackrel{^}{\theta }\right)+{\left(Bias\left[\stackrel{^}{\theta }\right]\right)}^{2}.$ An estimator is asymptotically efficient if $\underset{n\to \infty }{\mathrm{lim}}MSE\left(\stackrel{^}{\theta }\right)=0.$ See any advanced statistics text or Statistical terminology for further information on these concepts.

## Application of the ml method to regressions

The discussion above illustrates the basics of the ML method—you form the log of the likelihood function and then find the values of the parameter estimates that maximize this function. In most cases the maximization will not yield answers in closed form—that is, you cannot find a neat algebraic formula as we did for the population mean. However, you can use computer programs to search for the values of the parameter estimates that maximize this function. Thus, in most cases in advanced regression models you often will treat the ML method as a “black box” and not concern yourself with the estimation details. However, I illustrate one more example of the ML technique.

## The ml estimators for a simple regression.

Assume that we want to estimate the population parameters for the regression model ${y}_{i}=\beta {x}_{i}+{\epsilon }_{i},$ where we assume that

1. ${\epsilon }_{i}~N\left(0,{\sigma }^{2}\right),$
2. $E\left({\epsilon }_{i}{\epsilon }_{j}\right)=0$ for $i\ne j,$
3. ${y}_{i}={Y}_{i}-\overline{Y}$ and ${x}_{i}={X}_{i}-\overline{X}$ (this assumption allows us to ignore the estimation of the intercept term), and
4. ${x}_{i}$ is a non-stochastic variable.

The assumption of a normally distributed error term implies that ${\epsilon }_{i}={y}_{i}-\beta {x}_{i}~N\left(0,{\sigma }^{2}\right).$ Thus, the pdf of the error term is $f\left({\epsilon }_{i}\right)=\frac{1}{\sigma \sqrt{2\pi }}{e}^{-\frac{{\left({y}_{i}-\beta {x}_{i}\right)}^{2}}{2{\sigma }^{2}}}$ and, thus, the likelihood function The symbol $\prod _{i=1}^{n}{x}_{1}$ is equivalent to the product ${x}_{1}{x}_{2}\cdots {x}_{n}.$ is:

$\prod _{i=1}^{n}f\left({\epsilon }_{i}\right)=\prod _{i=1}^{n}\frac{1}{\sigma \sqrt{2\pi }}{e}^{-\frac{{\left({y}_{i}-\beta {x}_{i}\right)}^{2}}{2{\sigma }^{2}}}={\left(\frac{1}{\sigma \sqrt{2\pi }}\right)}^{n}\prod _{i=1}^{n}{e}^{-\frac{{\left({y}_{i}-\beta {x}_{i}\right)}^{2}}{2{\sigma }^{2}}}$

and the log of the likelihood function is $L\left({\epsilon }_{1},{\epsilon }_{2},\dots ,{\epsilon }_{n}\right)=-n\mathrm{ln}\sqrt{2\pi }-n\mathrm{ln}\stackrel{⌢}{\sigma }-\frac{\sum _{i=1}^{n}{\left({y}_{i}-\stackrel{⌢}{\beta }{x}_{i}\right)}^{2}}{2{\stackrel{⌢}{\sigma }}^{2}}.$

We find the estimators $\stackrel{⌢}{\beta }$ and $\stackrel{⌢}{\sigma }$ in the same manner as we did for the sample mean and variance. Differentiating the log of the likelihood function and setting these first derivatives equal to 0 gives the following two first-order conditions:

$\frac{\partial L\left({\epsilon }_{1},{\epsilon }_{2},\dots ,{\epsilon }_{n}\right)}{\partial \stackrel{⌢}{\beta }}=\frac{2\sum _{i=1}^{n}\left({y}_{i}-\stackrel{⌢}{\beta }{x}_{i}\right){x}_{i}}{2{\stackrel{⌢}{\sigma }}^{2}}=0$

and

$\frac{\partial L\left({\epsilon }_{1},{\epsilon }_{2},\dots ,{\epsilon }_{n}\right)}{\partial \stackrel{⌢}{\sigma }}=-\frac{n}{\stackrel{⌢}{\sigma }}+\frac{\sum _{i=1}^{n}{\left({y}_{i}-\stackrel{⌢}{\beta }{x}_{i}\right)}^{2}}{{\stackrel{⌢}{\sigma }}^{3}}=0.$

Thus, the ML estimators are:

Notice that in this simple case the ML estimator of $\beta$ is the same as the OLS estimator of $\beta$ . Also, notice that the ML estimator of ${\sigma }^{2}$ is biased—the (unbiased) OLS estimator of ${\sigma }^{2}$ is ${s}^{2}=\frac{\sum _{i=1}^{n}{\left({y}_{i}-\stackrel{⌢}{\beta }{x}_{i}\right)}^{2}}{n-2}.$

You can use the examples in this module as the basis of your understanding of the ML method. When you see that the ML method is used in a computer program, you can be fairly certain that the program uses one of the many optimizing subroutines to find the maximum of the log of the likelihood program. You can consult the help files with the computer program to see what underlying distribution is used to set up the log of the likelihood function. A concept related to the maximum likelihood estimation method worth exploring is the likelihood ratio test (see the module by Don Johnson entitled The Likelihood Ratio Test for an introduction to this key statistical test.)

## Exercises

Consider the following functions. For each of them, (1) prove that the function is a pdf; (2) calculate the mean and variance of each distribution, and (3) find the maximum likelihood estimator of the parameter $\theta .$ Sketch a graph of each of the distributions for a representative value of $\theta .$

1. $f\left(x;\theta \right)=\left(\theta +1\right){x}^{\theta }$ where and $\theta >0.$
2. $f\left(x;\theta \right)=\theta {e}^{-\theta x}$ where $0\le x<\infty$ and $\theta >0.$

#### Questions & Answers

Application of nanotechnology in medicine
what is variations in raman spectra for nanomaterials
Jyoti Reply
I only see partial conversation and what's the question here!
Crow Reply
what about nanotechnology for water purification
RAW Reply
please someone correct me if I'm wrong but I think one can use nanoparticles, specially silver nanoparticles for water treatment.
Damian
yes that's correct
Professor
I think
Professor
what is the stm
Brian Reply
is there industrial application of fullrenes. What is the method to prepare fullrene on large scale.?
Rafiq
industrial application...? mmm I think on the medical side as drug carrier, but you should go deeper on your research, I may be wrong
Damian
How we are making nano material?
LITNING Reply
what is a peer
LITNING Reply
What is meant by 'nano scale'?
LITNING Reply
What is STMs full form?
LITNING
scanning tunneling microscope
Sahil
how nano science is used for hydrophobicity
Santosh
Do u think that Graphene and Fullrene fiber can be used to make Air Plane body structure the lightest and strongest. Rafiq
Rafiq
what is differents between GO and RGO?
Mahi
what is simplest way to understand the applications of nano robots used to detect the cancer affected cell of human body.? How this robot is carried to required site of body cell.? what will be the carrier material and how can be detected that correct delivery of drug is done Rafiq
Rafiq
if virus is killing to make ARTIFICIAL DNA OF GRAPHENE FOR KILLED THE VIRUS .THIS IS OUR ASSUMPTION
Anam
analytical skills graphene is prepared to kill any type viruses .
Anam
what is Nano technology ?
Bob Reply
write examples of Nano molecule?
Bob
The nanotechnology is as new science, to scale nanometric
brayan
nanotechnology is the study, desing, synthesis, manipulation and application of materials and functional systems through control of matter at nanoscale
Damian
Is there any normative that regulates the use of silver nanoparticles?
Damian Reply
what king of growth are you checking .?
Renato
What fields keep nano created devices from performing or assimulating ? Magnetic fields ? Are do they assimilate ?
Stoney Reply
why we need to study biomolecules, molecular biology in nanotechnology?
Adin Reply
?
Kyle
yes I'm doing my masters in nanotechnology, we are being studying all these domains as well..
Adin
why?
Adin
what school?
Kyle
biomolecules are e building blocks of every organics and inorganic materials.
Joe
anyone know any internet site where one can find nanotechnology papers?
Damian Reply
research.net
kanaga
sciencedirect big data base
Ernesto
Introduction about quantum dots in nanotechnology
Praveena Reply
hi
Loga
what does nano mean?
Anassong Reply
nano basically means 10^(-9). nanometer is a unit to measure length.
Bharti
Got questions? Join the online conversation and get instant answers!
Jobilize.com Reply

### Read also:

#### Get the best Algebra and trigonometry course in your pocket!

Source:  OpenStax, Econometrics for honors students. OpenStax CNX. Jul 20, 2010 Download for free at http://cnx.org/content/col11208/1.2
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Econometrics for honors students' conversation and receive update notifications?

 By Stephen Voron By Yasser Ibrahim By By Jazzycazz Jackson By Briana Knowlton By Janet Forrester By OpenStax By OpenStax By OpenStax By Maureen Miller