<< Chapter < Page Chapter >> Page >
This course is a short series of lectures on Introductory Statistics. Topics covered are listed in the Table of Contents. The notes were prepared by EwaPaszek and Marek Kimmel. The development of this course has been supported by NSF 0203396 grant.

Size sample

Very frequently asked question in statistical consulting is, how large should the sample size be to estimate a mean?

The answer will depend on the variation associated with the random variable under observation. The statistician could correctly respond, only one item is needed, provided that the standard deviation of the distribution is zero. That is, if σ is equal zero, then the value of that one item would necessarily equal the unknown mean of the distribution. This is the extreme case and one that is not met in practice. However, the smaller the variance, the smaller the sample size needed to achieve a given degree of accuracy.

A mathematics department wishes to evaluate a new method of teaching calculus that does mathematics using a computer. At the end of the course, the evaluation will be made on the basis of scores of the participating students on a standard test. Because there is an interest in estimating the mean score μ , for students taking calculus using computer so there is a desire to determine the number of students, n , who are to be selected at random from a larger group. So, let find the sample size n such that we are fairly confident that x ¯ ± 1 contains the unknown test mean μ , from past experience it is believed that the standard deviation associated with this type of test is 15. Accordingly, using the fact that the sample mean of the test scores, X ¯ , is approximately N ( μ , σ 2 / n ) , it is seen that the interval given by x ¯ ± 1.96 ( 15 / n ) will serve as an approximate 95% confidence interval for μ .

That is, 1.96 ( 15 n ) = 1 or equivalently n = 29.4 and thus n 864.36 or n =865 because n must be an integer. It is quite likely that it had not been anticipated that as many as 865 students would be needed in this study. If that is the case, the statistician must discuss with those involved in the experiment whether or not the accuracy and the confidence level could be relaxed some. For illustration, rather than requiring x ¯ ± 1 to be a 95% confidence interval for μ , possibly x ¯ ± 2 would be satisfactory for 80% one. If this modification is acceptable, we now have 1.282 ( 15 n ) = 2 or equivalently, n = 9.615 and thus n 92.4 . Since n must be an integer = 93 is used in practice.

Got questions? Get instant answers now!

Most likely, the person involved in this project would find this a more reasonable sample size. Of course, any sample size greater than 93 could be used. Then either the length of the confidence interval could be decreased from that of x ¯ ± 2 or the confidence coefficient could be increased from 80% or a combination of both. Also, since there might be some question of whether the standard deviation σ actually equals 15, the sample standard deviations would no doubt be used in the construction of the interval.

For example , suppose that the sample characteristics observed are n = 145 , x ¯ = 77.2 , s = 13.2 ; then, x ¯ ± 1.282 s n or 77.2 ± 1.41 provides an approximate 80% confidence interval for μ .

In general, if we want the 100 ( 1 α ) % confidence interval for μ , x ¯ ± z α / 2 ( σ / n ) , to be no longer than that given by x ¯ ± ε , the sample size n is the solution of ε = z α / 2 σ n , where Φ ( z α / 2 ) = 1 α 2 .

That is, n = z α / 2 2 σ 2 ε 2 , where it is assumed that σ 2 is known.

Sometimes ε = z α / 2 σ / n is called the maximum error of the estimate . If the experimenter has no ideas about the value of σ 2 , it may be necessary to first take a preliminary sample to estimate σ 2 .

The type of statistic we see most often in newspaper and magazines is an estimate of a proportion p . We might, for example, want to know the percentage of the labor force that is unemployed or the percentage of voters favoring a certain candidate. Sometimes extremely important decisions are made on the basis of these estimates. If this is the case, we would most certainly desire short confidence intervals for p with large confidence coefficients. We recognize that these conditions will require a large sample size. On the other hand, if the fraction p being estimated is not too important, an estimate associated with a longer confidence interval with a smaller confidence coefficients is satisfactory; and thus a smaller sample size can be used.

In general , to find the required sample size to estimate p , recall that the point estimate of p is p ^ = z α / 2 p ^ ( 1 p ^ ) n .

Suppose we want an estimate of p that is within ε of the unknown p with 100 ( 1 α ) % confidence where ε = z α / 2 p ^ ( 1 p ^ ) / n is the maximum error of the point estimate p ^ = y / n . Since p ^ is unknown before the experiment is run, we cannot use the value of p ^ in our determination of n . However, if it is known that p is about equal to p * , the necessary sample size n is the solution of ε = z α / 2 p ( 1 p ) n . That is, n = z α / 2 2 p ( 1 p ) ε 2 .

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Introduction to statistics. OpenStax CNX. Oct 09, 2007 Download for free at http://cnx.org/content/col10343/1.3
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Introduction to statistics' conversation and receive update notifications?

Ask