<< Chapter < Page Chapter >> Page >

In practice, we rarely know the population standard deviation . In the past, when the sample size was large, this did not present a problem to statisticians. They used the sample standard deviation s as an estimate for σ and proceeded as before to calculate a confidence interval with close enough results. However, statisticians ran into problems when the sample size was small. A small sample size caused inaccuracies in the confidence interval.

William S. Goset (1876–1937) of the Guinness brewery in Dublin, Ireland ran into this problem. His experiments with hops and barley produced very few samples. Just replacing σ with s did not produce accurate results when he tried to calculate a confidence interval. He realized that he could not use a normal distribution for the calculation; he found that the actual distribution depends on the sample size. This problem led him to "discover" what is called the Student's t-distribution . The name comes from the fact that Gosset wrote under the pen name "Student."

Up until the mid-1970s, some statisticians used the normal distribution approximation for large sample sizes and only used the Student's t-distribution only for sample sizes of at most 30. With graphing calculators and computers, the practice now is to use the Student's t-distribution whenever s is used as an estimate for σ .

If you draw a simple random sample of size n from a population that has an approximately a normal distribution with mean μ and unknown population standard deviation σ and calculate the t -score t = x ¯ μ ( s n ) , then the t -scores follow a Student's t-distribution with n – 1 degrees of freedom . The t -score has the same interpretation as the z -score . It measures how far x ¯ is from its mean μ . For each sample size n , there is a different Student's t-distribution.

The degrees of freedom , n – 1 , come from the calculation of the sample standard deviation s . In [link] , we used n deviations ( x x ¯ values ) to calculate s . Because the sum of the deviations is zero, we can find the last deviation once we know the other n – 1 deviations. The other n – 1 deviations can change or vary freely. We call the number n – 1 the degrees of freedom (df).

    Properties of the student's t-distribution

  • The graph for the Student's t-distribution is similar to the standard normal curve.
  • The mean for the Student's t-distribution is zero and the distribution is symmetric about zero.
  • The Student's t-distribution has more probability in its tails than the standard normal distribution because the spread of the t-distribution is greater than the spread of the standard normal. So the graph of the Student's t-distribution will be thicker in the tails and shorter in the center than the graph of the standard normal distribution.
  • The exact shape of the Student's t-distribution depends on the degrees of freedom. As the degrees of freedom increases, the graph of Student's t-distribution becomes more like the graph of the standard normal distribution.
  • The underlying population of individual observations is assumed to be normally distributed with unknown population mean μ and unknown population standard deviation σ . The size of the underlying population is generally not relevant unless it is very small. If it is bell shaped (normal) then the assumption is met and doesn't need discussion. Random sampling is assumed, but that is a completely separate assumption from normality.

Questions & Answers

what do they mean in a question when you are asked to find P40 and P88
Megrina Reply
I dont get your question! What are you talk ING about?
Mani
hi
Mehri
you're asked to find page 40 and page 88 on that particular book.
Joseph
hi
ravi
any suggestions for statistics app better than this
ravi
sorry miss wrote the question
omar
No problem) By the way. I NEED a program For statistical data analysis. Any suggestion?
Mani
Eviews will help u
Kwadwo
Hello
Okonkwo
arey there any data analyst and working on sas statistical model building
ravi
IMAGESNEWSVIDEOS A Dictionary of Computing. measures of location Quantities that represent the average or typical value of a random variable (compare measures of variation). They are either properties of a probability distribution or computed statistics of a sample. Three important measures are the mean, median, and mode.
Ahmed Reply
define the measures of location
Kaynaat Reply
IMAGESNEWSVIDEOS A Dictionary of Computing. measures of location Quantities that represent the average or typical value of a random variable (compare measures of variation). They are either properties of a probability distribution or computed statistics of a sample. Three important measures are th
Ahmed
what is confidence interval estimate and its formula in getting it
Jhezarie Reply
discuss the roles of vital and health statistic in the planning of health service of the community
BITRUS Reply
given that the probability of
BITRUS
can man city win Liverpool ?
Emmanuel Reply
There are two coins on a table. When both are flipped, one coin land on heads eith probability 0.5 while the other lands on head with probability 0.6. A coin is randomly selected from the table and flipped. (a) what is probability it lands on heads? (b) given that it lands on tail, what is the Condi
Nusrat Reply
0.5*0.5+0.5*0.6
Ravasz
what is gradient descent?
Saurav Reply
It should be a Machine learning terms。
Mok
it is a term used in linear regression
Saurav
what are the differences between standard deviation and variancs?
Enhance
what is statistics
Emmanuel Reply
statistics is the collection and interpretation of data
Enhance
the science of summarization and description of numerical facts
Enhance
Is the estimation of probability
Zaini
mr. zaini..can u tell me more clearly how to calculated pair t test
Haai
do you have MG Akarwal Statistics' book Zaini?
Enhance
Haai how r u?
Enhance
maybe .... mathematics is the science of simplification and statistics is the interpretation of such values and its implications.
Miguel
can we discuss about pair test
Haai
what is outlier?
Usama Reply
outlier is an observation point that is distant from other observations.
Gidigah
what is its effect on mode?
Usama
Outlier  have little effect on the mode of a given set of data.
Gidigah
How can you identify a possible outlier(s) in a data set.
Daniel
The best visualisation method to identify the outlier is box and wisker method or boxplot diagram. The points which are located outside the max edge of wisker(both side) are considered as outlier.
Akash
@Daniel Adunkwah - Usually you can identify an outlier visually. They lie outside the observed pattern of the other data points, thus they're called outliers.
Ron
what is completeness?
Muhammad
I am new to this. I am trying to learn.
Dom
I am also new Dom, welcome!
Nthabi
thanks
Dom
please my friend i want same general points about statistics. say same thing
alex
outliers do not have effect on mode
Meselu
also new
yousaf
I don't get the example
Hadekunle Reply
ways of collecting data at least 10 and explain
Ridwan Reply
Example of discrete variable
Bada Reply
sales made monthly.
Gbenga
I am new here, can I get someone to guide up?
alayo
dies outcome is 1, 2, 3, 4, 5, 6 nothing come outside of it. it is an example of discrete variable
jainesh
continue variable is any value value between 0 to 1 it could be 4digit values eg 0.1, 0.21, 0.13, 0.623, 0.32
jainesh
How to answer quantitative data
Alhassan Reply
hi
Kachalla
what's up here ... am new here
Kachalla
sorry question a bit unclear...do you mean how do you analyze quantitative data? If yes, it depends on the specific question(s) you set in the beginning as well as on the data you collected. So the method of data analysis will be dependent on the data collecter and questions asked.
Bheka
how to solve for degree of freedom
saliou
Quantitative data is the data in numeric form. For eg: Income of persons asked is 10,000. This data is quantitative data on the other hand data collected for either make or female is qualitative data.
Rohan
*male
Rohan
Degree of freedom is the unconditionality. For example if you have total number of observations n, and you have to calculate variance, obviously you will need mean for that. Here mean is a condition, without which you cannot calculate variance. Therefore degree of freedom for variance will be n-1.
Rohan
data that is best presented in categories like haircolor, food taste (good, bad, fair, terrible) constitutes qualitative data
Bheka
vegetation types (grasslands, forests etc) qualitative data
Bheka
I don't understand how you solved it can you teach me
Caleb Reply
solve what?
Ambo
mean
Vanarith

Get the best Introductory statistics course in your pocket!





Source:  OpenStax, Introductory statistics. OpenStax CNX. May 06, 2016 Download for free at http://legacy.cnx.org/content/col11562/1.18
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Introductory statistics' conversation and receive update notifications?

Ask