# 3.6 Asymptotic distribution of maximum likelihood estimators

 Page 1 / 1
This course is a short series of lectures on Introductory Statistics. Topics covered are listed in the Table of Contents. The notes were prepared by EwaPaszek and Marek Kimmel. The development of this course has been supported by NSF 0203396 grant.

## Asymptotic distribution of maximum likelihood estimators

Let consider a distribution with p.d.f. $f\left(x;\theta \right)$ such that the parameter $\theta$ is not involved in the support of the distribution. We want to be able to find the maximum likelihood estimator $\stackrel{^}{\theta }$ by solving $\frac{\partial \left[\mathrm{ln}L\left(\theta \right)\right]}{\partial \theta }=0,$ where here the partial derivative was used because $L\left(\theta \right)$ involves ${x}_{1},{x}_{2},...,{x}_{n}$ .

That is, $\frac{\partial \left[\mathrm{ln}L\left(\stackrel{^}{\theta }\right)\right]}{\partial \theta }=0,$ where now, with $\stackrel{^}{\theta }$ in this expression, $L\left(\stackrel{^}{\theta }\right)=f\left({X}_{1};\stackrel{^}{\theta }\right)f\left({X}_{2};\stackrel{^}{\theta }\right)···f\left({X}_{n};\stackrel{^}{\theta }\right).$

We can approximate the left-hand member of this latter equation by a linear function found from the first two terms of a Taylor’s series expanded about $\theta$ , namely $\frac{\partial \left[\mathrm{ln}L\left(\theta \right)\right]}{\partial \theta }+\left(\stackrel{^}{\theta }-\theta \right)\frac{{\partial }^{2}\left[\mathrm{ln}L\left(\theta \right)\right]}{\partial {\theta }^{2}}\approx 0,$ when $L\left(\theta \right)=f\left({X}_{1};\theta \right)f\left({X}_{2};\theta \right)···f\left({X}_{n};\theta \right).$

Obviously, this approximation is good enough only if $\stackrel{^}{\theta }$ is close to $\theta$ , and an adequate mathematical proof involves those conditions. But a heuristic argument can be made by solving for $\stackrel{^}{\theta }-\theta$ to obtain

Recall that $\mathrm{ln}L\left(\theta \right)=\mathrm{ln}f\left({X}_{1};\theta \right)+\mathrm{ln}f\left({X}_{2};\theta \right)+···+\mathrm{ln}f\left({X}_{n};\theta \right)$ and

$\frac{\partial \mathrm{ln}L\left(\theta \right)}{\partial \theta }=\sum _{i=1}^{n}\frac{\partial \left[\mathrm{ln}f\left({X}_{i};\theta \right)\right]}{\partial \theta };$

The expression (2) is the sum of the n independent and identically distributed random variables ${Y}_{i}=\frac{\partial \left[\mathrm{ln}f\left({X}_{i};\theta \right)\right]}{\partial \theta },i=1,2,...,n.$ and thus the Central Limit Theorem has an approximate normal distribution with mean (in the continuous case) equal to

$\begin{array}{l}\underset{-\infty }{\overset{\infty }{\int }}\frac{\partial \left[\mathrm{ln}f\left({x}_{i};\theta \right)\right]}{\partial \theta }f\left(x;\theta \right)dx=\underset{-\infty }{\overset{\infty }{\int }}\frac{\partial \left[f\left({x}_{i};\theta \right)\right]}{\partial \theta }\frac{f\left({x}_{i};\theta \right)}{f\left({x}_{i};\theta \right)}dx=\underset{-\infty }{\overset{\infty }{\int }}\frac{\partial \left[f\left({x}_{i};\theta \right)\right]}{\partial \theta }dx\\ =\frac{\partial }{d\partial }\left[\underset{-\infty }{\overset{\infty }{\int }}f\left({x}_{i};\theta \right)dx\right]=\frac{\partial }{d\partial }\left[1\right]=0.\end{array}$

Clearly, the mathematical condition is needed that it is permissible to interchange the operations of integration and differentiation in those last steps. Of course, the integral of $f\left({x}_{i};\theta \right)$ is equal to one because it is a p.d.f.

Since we know that the mean of each Y is $\underset{-\infty }{\overset{\infty }{\int }}\frac{\partial \left[\mathrm{ln}f\left({x}_{i};\theta \right)\right]}{\partial \theta }f\left(x;\theta \right)dx=0$ let us take derivatives of each member of this equation with respect to $\theta$ obtaining

$\underset{-\infty }{\overset{\infty }{\int }}\left\{\frac{{\partial }^{2}\left[\mathrm{ln}f\left({x}_{i};\theta \right)\right]}{\partial {\theta }^{2}}f\left(x;\theta \right)+\frac{\partial \left[\mathrm{ln}f\left({x}_{i};\theta \right)\right]}{\partial \theta }\frac{\partial \left[f\left({x}_{i};\theta \right)\right]}{\partial \theta }\right\}dx=0.$

However, $\frac{\partial \left[f\left({x}_{i};\theta \right)\right]}{\partial \theta }=\frac{\partial \left[\mathrm{ln}f\left({x}_{i};\theta \right)\right]}{\partial \theta }f\left(x;\theta \right)$ so ${\underset{-\infty }{\overset{\infty }{\int }}\left\{\frac{\partial \left[\mathrm{ln}f\left({x}_{i};\theta \right)\right]}{\partial \theta }\right\}}^{2}f\left(x;\theta \right)dx=-\underset{-\infty }{\overset{\infty }{\int }}\frac{{\partial }^{2}\left[\mathrm{ln}f\left({x}_{i};\theta \right)\right]}{\partial {\theta }^{2}}f\left({x}_{i};\theta \right)dx.$

Since $E\left(Y\right)=0$ , this last expression provides the variance of $Y=\partial \left[\mathrm{ln}f\left(X;\theta \right)\right]/d\partial .$ Then the variance of expression (2) is n times this value, namely

$-nE\left\{\frac{{\partial }^{2}\left[\mathrm{ln}f\left({x}_{i};\theta \right)\right]}{\partial {\theta }^{2}}\right\}.$

Let us rewrite (1) as

$\frac{\sqrt{n}\left(\stackrel{^}{\theta }-\theta \right)}{1-\sqrt{-E\left\{{\partial }^{2}\left[\mathrm{ln}f\left(X;\theta \right)\right]/\partial {\theta }^{2}\right\}}}=\frac{\frac{\partial \left[\mathrm{ln}L\left(\theta \right)\right]/\partial \theta }{\sqrt{-E\left\{{\partial }^{2}\left[\mathrm{ln}f\left(X;\theta \right)\right]/\partial {\theta }^{2}\right\}}}}{\frac{-\frac{1}{n}\frac{{\partial }^{2}\left[\mathrm{ln}L\left(\theta \right)\right]}{\partial {\theta }^{2}}}{E\left\{-{\partial }^{2}\left[\mathrm{ln}f\left(X;\theta \right)\right]/\partial {\theta }^{2}\right\}}}$

The numerator of (4) has an approximate $N\left(0,1\right)$ distribution; and those unstated mathematical condition require, in some sense for $-\frac{1}{n}\frac{{\partial }^{2}\left[\mathrm{ln}L\left(\theta \right)\right]}{\partial {\theta }^{2}}$ to converge to $E\left[-{\partial }^{2}\left[\mathrm{ln}f\left(X;\theta \right)\right]/\partial {\theta }^{2}\right]$ . Accordingly, the ratios given in equation (4) must be approximately $N\left(0,1\right)$ . That is, $\stackrel{^}{\theta }$ has an approximate normal distribution with mean $\theta$ and standard deviation $\frac{1}{\sqrt{-nE\left\{{\partial }^{2}\left[\mathrm{ln}f\left(X;\theta \right)\right]/\partial {\theta }^{2}\right\}}}$ .

With the underlying exponential p.d.f. $f\left(x;\theta \right)=\frac{1}{\theta }{e}^{-x/\theta },0 $\overline{X}$ is the maximum likelihood estimator. Since $\mathrm{ln}f\left(x;\theta \right)=-\mathrm{ln}\theta -\frac{x}{\theta }$ and $\frac{\partial \left[\mathrm{ln}f\left(x;\theta \right)\right]}{\partial \theta }=-\frac{1}{\theta }+\frac{x}{{\theta }^{2}}$ and $\frac{{\partial }^{2}\left[\mathrm{ln}f\left(x;\theta \right)\right]}{\partial {\theta }^{}}=\frac{1}{{\theta }^{2}}-\frac{2x}{{\theta }^{3}}$ , we have $-E\left[\frac{1}{{\theta }^{2}}-\frac{2X}{{\theta }^{3}}\right]=-\frac{1}{\theta }+\frac{2\theta }{{\theta }^{3}}=\frac{1}{{\theta }^{2}}$ because $E\left(X\right)=\theta$ . That is, $\overline{X}$ has an approximate distribution with mean $\theta$ and standard deviation $\theta /\sqrt{n}$ . Thus the random interval $\overline{X}±1.96\left(\theta /\sqrt{n}\right)$ has an approximate probability of 0.95 for covering $\theta$ . Substituting the observed $\overline{x}$ for $\theta$ , as well as for $\overline{X}$ , we say that $\overline{x}±1.96\overline{x}/\sqrt{n}$ is an approximate 95% confidence interval for $\theta$ .

The maximum likelihood estimator for $\lambda$ in $f\left(x;\lambda \right)=\frac{{\lambda }^{x}{e}^{-\lambda }}{x!},x=0,1,2,...;\theta \in \Omega =\left\{\theta :0<\theta <\infty \right\}$ is $\stackrel{^}{\lambda }=\overline{X}$ Now $\mathrm{ln}f\left(x;\lambda \right)=x\mathrm{ln}\lambda -\lambda -\mathrm{ln}x!$ and $\frac{\partial \left[\mathrm{ln}f\left(x;\lambda \right)\right]}{\partial \lambda }=\frac{x}{\lambda }-1$ and $\frac{{\partial }^{2}\left[\mathrm{ln}f\left(x;\lambda \right)\right]}{\partial {\lambda }^{2}}=\frac{x}{{\lambda }^{2}}$ . Thus $-E\left(-\frac{X}{{\lambda }^{2}}\right)=\frac{\lambda }{{\lambda }^{2}}=\frac{1}{\lambda }$ and $\stackrel{^}{\lambda }=\overline{X}$ has an approximate normal distribution with mean $\lambda$ and standard deviation $\sqrt{\lambda /n}$ . Finally $\overline{x}±1.645\sqrt{\overline{x}/n}$ serves as an approximate 90% confidence interval for $\lambda$ . With the data from example(…) $\overline{x}=2.225$ and hence this interval is from 1.887 to 2.563.

It is interesting that there is another theorem which is somewhat related to the preceding result in that the variance of $\stackrel{^}{\theta }$ serves as a lower bound for the variance of every unbiased estimator of $\theta$ . Thus we know that if a certain unbiased estimator has a variance equal to that lower bound, we cannot find a better one and hence it is the best in the sense of being the unbiased minimum variance estimator . This is called the Rao-Cramer Inequality .

Let ${X}_{1},{X}_{2},...,{X}_{n}$ be a random sample from a distribution with p.d.f. $f\left(x;\theta \right),\theta \in \Omega =\left\{\theta :c<\theta where the support X does not depend upon $\theta$ so that we can differentiate, with respect to $\theta$ , under integral signs like that in the following integral:

$\underset{-\infty }{\overset{\infty }{\int }}f\left(x;\theta \right)dx=1.$

If $Y=u\left({X}_{1},{X}_{2},...,{X}_{n}\right)$ is an unbiased estimator of $\theta$ , then

$Var\left(Y\right)\ge \frac{1}{n\underset{-\infty }{\overset{\infty }{\int }}{\left\{\left[\partial \mathrm{ln}f\left(x;\theta \right)/\partial \theta \right]\right\}}^{2}f\left(x;\theta \right)dx}=\frac{-1}{n\underset{-\infty }{\overset{\infty }{\int }}\left[{\partial }^{2}\mathrm{ln}f\left(x;\theta \right)/\partial {\theta }^{2}\right]f\left(x;\theta \right)dx}.$

Note that the two integrals in the respective denominators are the expectations $E\left\{{\left[\frac{\partial \mathrm{ln}f\left(X;\theta \right)}{\partial \theta }\right]}^{2}\right\}$ and $E\left[\frac{{\partial }^{2}\mathrm{ln}f\left(X;\theta \right)}{\partial {\theta }^{2}}\right]$ sometimes one is easier to compute that the other.

Note that above the lower bound of two distributions: exponential and Poisson was computed. Those respective lower bounds were ${\theta }^{2}}{n}$ and $\lambda }{n}$ . Since in each case, the variance of $\overline{X}$ equals the lower bound, then $\overline{X}$ is the unbiased minimum variance estimator.

The sample arises from a distribution with p.d.f. $f\left(x;\theta \right)=\theta {x}^{\theta -1},0

We have $\mathrm{ln}f\left(x;\theta \right)=\mathrm{ln}\theta +\left(\theta -1\right)\mathrm{ln}x,\frac{\partial \mathrm{ln}f\left(x;\theta \right)}{\partial \theta }=\frac{1}{\theta }+\mathrm{ln}x,$ and $\frac{{\partial }^{2}\mathrm{ln}f\left(x;\theta \right)}{\partial {\theta }^{2}}=-\frac{1}{{\theta }^{2}}.$

Since $E\left(-1/{\theta }^{2}\right)=-1/{\theta }^{2}$ , the lower bound of the variance of every unbiased estimator of $\theta$ is ${\theta }^{2}/n$ . Moreover, the maximum likelihood estimator $\stackrel{^}{\theta }=-n/\mathrm{ln}\prod _{i=1}^{n}{X}_{i}$ has an approximate normal distribution with mean $\theta$ and variance ${\theta }^{2}/n$ . Thus, in a limiting sense, $\stackrel{^}{\theta }$ is the unbiased minimum variance estimator of $\theta$ .

To measure the value of estimators; their variances are compared to the Rao-Cramer lower bound. The ratio of the Rao-Cramer lower bound to the actual variance of any unbiased estimator is called the efficiency of that estimator. As estimator with efficiency of 50% requires that 1/0.5=2 times as many sample observations are needed to do as well in estimation as can be done with the unbiased minimum variance estimator (then 100% efficient estimator).

show that the set of all natural number form semi group under the composition of addition
explain and give four Example hyperbolic function
_3_2_1
felecia
⅗ ⅔½
felecia
_½+⅔-¾
felecia
The denominator of a certain fraction is 9 more than the numerator. If 6 is added to both terms of the fraction, the value of the fraction becomes 2/3. Find the original fraction. 2. The sum of the least and greatest of 3 consecutive integers is 60. What are the valu
1. x + 6 2 -------------- = _ x + 9 + 6 3 x + 6 3 ----------- x -- (cross multiply) x + 15 2 3(x + 6) = 2(x + 15) 3x + 18 = 2x + 30 (-2x from both) x + 18 = 30 (-18 from both) x = 12 Test: 12 + 6 18 2 -------------- = --- = --- 12 + 9 + 6 27 3
Pawel
2. (x) + (x + 2) = 60 2x + 2 = 60 2x = 58 x = 29 29, 30, & 31
Pawel
ok
Ifeanyi
on number 2 question How did you got 2x +2
Ifeanyi
combine like terms. x + x + 2 is same as 2x + 2
Pawel
x*x=2
felecia
2+2x=
felecia
Mark and Don are planning to sell each of their marble collections at a garage sale. If Don has 1 more than 3 times the number of marbles Mark has, how many does each boy have to sell if the total number of marbles is 113?
Mark = x,. Don = 3x + 1 x + 3x + 1 = 113 4x = 112, x = 28 Mark = 28, Don = 85, 28 + 85 = 113
Pawel
how do I set up the problem?
what is a solution set?
Harshika
find the subring of gaussian integers?
Rofiqul
hello, I am happy to help!
Abdullahi
hi mam
Mark
find the value of 2x=32
divide by 2 on each side of the equal sign to solve for x
corri
X=16
Michael
Want to review on complex number 1.What are complex number 2.How to solve complex number problems.
Beyan
yes i wantt to review
Mark
use the y -intercept and slope to sketch the graph of the equation y=6x
how do we prove the quadratic formular
Darius
hello, if you have a question about Algebra 2. I may be able to help. I am an Algebra 2 Teacher
thank you help me with how to prove the quadratic equation
Seidu
may God blessed u for that. Please I want u to help me in sets.
Opoku
what is math number
4
Trista
x-2y+3z=-3 2x-y+z=7 -x+3y-z=6
can you teacch how to solve that🙏
Mark
Solve for the first variable in one of the equations, then substitute the result into the other equation. Point For: (6111,4111,−411)(6111,4111,-411) Equation Form: x=6111,y=4111,z=−411x=6111,y=4111,z=-411
Brenna
(61/11,41/11,−4/11)
Brenna
x=61/11 y=41/11 z=−4/11 x=61/11 y=41/11 z=-4/11
Brenna
Need help solving this problem (2/7)^-2
x+2y-z=7
Sidiki
what is the coefficient of -4×
-1
Shedrak
A soccer field is a rectangle 130 meters wide and 110 meters long. The coach asks players to run from one corner to the other corner diagonally across. What is that distance, to the nearest tenths place.
Jeannette has $5 and$10 bills in her wallet. The number of fives is three more than six times the number of tens. Let t represent the number of tens. Write an expression for the number of fives.
What is the expressiin for seven less than four times the number of nickels
How do i figure this problem out.
how do you translate this in Algebraic Expressions
why surface tension is zero at critical temperature
Shanjida
I think if critical temperature denote high temperature then a liquid stats boils that time the water stats to evaporate so some moles of h2o to up and due to high temp the bonding break they have low density so it can be a reason
s.
Need to simplify the expresin. 3/7 (x+y)-1/7 (x-1)=
. After 3 months on a diet, Lisa had lost 12% of her original weight. She lost 21 pounds. What was Lisa's original weight?
how did you get the value of 2000N.What calculations are needed to arrive at it
Privacy Information Security Software Version 1.1a
Good
Got questions? Join the online conversation and get instant answers!