<< Chapter < Page | Chapter >> Page > |
In the preceding considerations ( Confidence Intervals I ), the confidence interval for the mean $\mu $ of a normal distribution was found, assuming that the value of the standard deviation $\sigma $ is known. However, in most applications, the value of the standard deviation $\sigma $ is rather unknown, although in some cases one might have a very good idea about its value.
Suppose that the underlying distribution is normal and that ${\sigma}^{2}$ is unknown. It is shown that given random sample ${X}_{1},{X}_{2},\mathrm{...},{X}_{n}$ from a normal distribution, the statistic $$T=\frac{\overline{X}-\mu}{S/\sqrt{n}}$$ has a t distribution with $r=n-1$ degrees of freedom, where ${S}^{2}$ is the usual unbiased estimator of ${\sigma}^{2}$ , (see, t distribution ).
Select ${t}_{\alpha /2}\left(n-1\right)$ so that $$P\left[T\ge {t}_{\alpha /2}\left(n-1\right)\right]=\alpha /2.$$ Then
$$\begin{array}{l}1-\alpha =P\left[-{t}_{\alpha /2}\left(n-1\right)\le \frac{\overline{X}-\mu}{S/\sqrt{n}}\le {t}_{\alpha /2}\left(n-1\right)\right]\\ =P\left[-{t}_{\alpha /2}\left(n-1\right)\frac{S}{\sqrt{n}}\le \overline{X}-\mu \le {t}_{\alpha /2}\left(n-1\right)\frac{S}{\sqrt{n}}\right]\\ =P\left[-\overline{X}-{t}_{\alpha /2}\left(n-1\right)\frac{S}{\sqrt{n}}\le -\mu \le -\overline{X}+{t}_{\alpha /2}\left(n-1\right)\frac{S}{\sqrt{n}}\right]\\ =P\left[\overline{X}-{t}_{\alpha /2}\left(n-1\right)\frac{S}{\sqrt{n}}\le -\mu \le \overline{X}+{t}_{\alpha /2}\left(n-1\right)\frac{S}{\sqrt{n}}\right].\end{array}$$
Thus the observations of a random sample provide a $\overline{x}$ and ${\text{s}}^{\text{2}}$ and $\overline{x}-{t}_{\alpha /2}\left(n-1\right)\frac{s}{\sqrt{n}},\overline{x}+{t}_{\alpha /2}\left(n-1\right)\frac{s}{\sqrt{n}}$ is a $100\left(1-\alpha \right)\%$ interval for $\mu $ .
Let X equals the amount of butterfat in pound produced by a typical cow during a 305-day milk production period between her first and second claves. Assume the distribution of X is $N\left(\mu ,{\sigma}^{2}\right)$ . To estimate $\mu $ a farmer measures the butterfat production for n-20 cows yielding the following data:
481 | 537 | 513 | 583 | 453 | 510 | 570 |
500 | 487 | 555 | 618 | 327 | 350 | 643 |
499 | 421 | 505 | 637 | 599 | 392 | - |
For these data, $\overline{x}=507.50$ and $s=89.75$ . Thus a point estimate of $\mu $ is $\overline{x}=507.50$ . Since ${t}_{0.05}\left(19\right)=1.729$ , a 90% confidence interval for $\mu $ is $507.50\pm 1.729\left(\frac{89.75}{\sqrt{20}}\right)$ , or equivalently, [472.80, 542.20].
Let T have a t distribution with n -1 degrees of freedom. Then, ${t}_{\alpha /2}\left(n-1\right)>{z}_{\alpha /2}$ . Consequently, the interval $\overline{x}\pm {z}_{\alpha /2}\sigma /\sqrt{n}$ is expected to be shorter than the interval $\overline{x}\pm {t}_{\alpha /2}\left(n-1\right)s/\sqrt{n}$ . After all, there gives more information, namely the value of $\sigma $ , in construction the first interval. However, the length of the second interval is very much dependent on the value of s . If the observed s is smaller than $\sigma $ , a shorter confidence interval could result by the second scheme. But on the average, $\overline{x}\pm {z}_{\alpha /2}\sigma /\sqrt{n}$ is the shorter of the two confidence intervals.
If it is not possible to assume that the underlying distribution is normal but $\mu $ and $\sigma $ are both unknown, approximate confidence intervals for $\mu $ can still be constructed using $$T=\frac{\overline{X}-\mu}{S/\sqrt{n}},$$ which now only has an approximate t distribution.
Generally, this approximation is quite good for many normal distributions, in particular, if the underlying distribution is symmetric, unimodal, and of the continuous type. However, if the distribution is highly skewed , there is a great danger using this approximation. In such a situation, it would be safer to use certain nonparametric method for finding a confidence interval for the median of the distribution.
The confidence interval for the variance ${\sigma}^{2}$ is based on the sample variance $${S}^{2}=\frac{1}{n-1}{{\displaystyle \sum _{i=1}^{n}\left({X}_{i}-\overline{X}\right)}}^{2}.$$
In order to find a confidence interval for ${\sigma}^{2}$ , it is used that the distribution of $\left(n-1\right){S}^{2}/{\sigma}^{2}$ is ${\chi}^{2}\left(n-1\right)$ . The constants a and b should selected from tabularized Chi Squared Distribution with n -1 degrees of freedom such that $$P\left(a\le \frac{\left(n-1\right){S}^{2}}{{\sigma}^{2}}\le b\right)=1-\alpha .$$
That is select a and b so that the probabilities in two tails are equal: $$a={\chi}_{1-\alpha /2}^{2}\left(n-1\right)$$ and $$b={\chi}_{\alpha /2}^{2}\left(n-1\right).$$ Then, solving the inequalities, we have $$1-\alpha =P\left(\frac{a}{\left(n-1\right){S}^{2}}\le \frac{1}{{\sigma}^{2}}\le \frac{b}{\left(n-1\right){S}^{2}}\right)=P\left(\frac{\left(n-1\right){S}^{2}}{b}\le {\sigma}^{2}\le \frac{\left(n-1\right){S}^{2}}{a}\right).$$
Thus the probability that the random interval $${\text{[(n-1)S}}^{\text{2}}{\text{/b,(n-1)S}}^{\text{2}}\text{/a]}$$ contains the unknown ${\sigma}^{2}$ is 1- $\alpha $ . Once the values of ${X}_{1},{X}_{2},\mathrm{...},{X}_{n}$ are observed to be ${x}_{1},{x}_{2},\mathrm{...},{x}_{n}$ and ${s}^{2}$ computed, then the interval $${\text{[(n-1)S}}^{\text{2}}{\text{/b, (n-1)S}}^{\text{2}}\text{/a]}$$ is a $100\left(1-\alpha \right)\%$ confidence interval for ${\sigma}^{2}$ .
It follows that $$\left[\sqrt{\left(n-1\right)/bs},\sqrt{\left(n-1\right)/as}\right]$$ is a $100\left(1-\alpha \right)\%$ confidence interval for $\sigma $ , the standard deviation.
Assume that the time in days required for maturation of seeds of a species of a flowering plant found in Mexico is $N\left(\mu ,{\sigma}^{2}\right)$ . A random sample of n =13 seeds, both parents having narrow leaves, yielded $\overline{x}$ =18.97 days and $12{s}^{2}={\displaystyle \sum _{i=1}^{13}{\left({x}_{}-\overline{x}\right)}^{2}=128.41}$ .
A confidence interval for ${\sigma}^{2}$ is $\left[\frac{128.41}{21.03},\frac{128.41}{5.226}\right]=\left[6.11,24.57\right]$ , because $5.226={\chi}_{0.95}^{2}\left(12\right)$ and $21.03={\chi}_{0.055}^{2}\left(12\right)$ , what can be read from the tabularized Chi Squared Distribution. The corresponding 90% confidence interval for $\sigma $ is $\left[\sqrt{6.11},\sqrt{24.57}\right]=\left[2.47,4.96\right].$
Although a and b are generally selected so that the probabilities in the two tails are equal, the resulting $100\left(1-\alpha \right)\%$ confidence interval is not the shortest that can be formed using the available data. The tables and appendixes gives solutions for a and b that yield confidence interval of minimum length for the standard deviation.
Notification Switch
Would you like to follow the 'Introduction to statistics' conversation and receive update notifications?