# 10.1 Two population means with unknown standard deviations

 Page 1 / 24
1. The two independent samples are simple random samples from two distinct populations.
2. For the two distinct populations:
• if the sample sizes are small, the distributions are important (should be normal)
• if the sample sizes are large, the distributions are not important (need not be normal)

The test comparing two independent population means with unknown and possibly unequal population standard deviations is called the Aspin-Welch t-test. The degrees of freedom formula was developed by Aspin-Welch.

The comparison of two population means is very common. A difference between the two samples depends on both the means and the standard deviations. Very different means can occur by chance if there is great variation among the individual samples. In order to account for the variation, we take the difference of the sample means, ${\overline{X}}_{1}$ ${\overline{X}}_{2}$ , and divide by the standard error in order to standardize the difference. The result is a t-score test statistic.

Because we do not know the population standard deviations, we estimate them using the two sample standard deviations from our independent samples. For the hypothesis test, we calculate the estimated standard deviation, or standard error , of the difference in sample means , ${\overline{X}}_{1}$ ${\overline{X}}_{2}$ .

## The standard error is:

$\sqrt{\frac{\left({s}_{1}{\right)}^{2}}{{n}_{1}}+\frac{\left({s}_{2}{\right)}^{2}}{{n}_{2}}}$

The test statistic ( t -score) is calculated as follows:

$\frac{\text{(}{\overline{x}}_{1}–{\overline{x}}_{2}\text{)}–\text{(}{\mu }_{1}–{\mu }_{2}\text{)}}{\sqrt{\frac{{\text{(}{s}_{1}\text{)}}^{2}}{{n}_{1}}+\frac{{\text{(}{s}_{2}\text{)}}^{2}}{{n}_{2}}}}$

## Where:

• s 1 and s 2 , the sample standard deviations, are estimates of σ 1 and σ 2 , respectively.
• σ 1 and σ 1 are the unknown population standard deviations.
• ${\overline{x}}_{1}$ and ${\overline{x}}_{2}$ are the sample means. μ 1 and μ 2 are the population means.

The number of degrees of freedom ( df ) requires a somewhat complicated calculation. However, a computer or calculator calculates it easily. The df are not always a whole number. The test statistic calculated previously is approximated by the Student's t -distribution with df as follows:

## Degrees of freedom

$df=\frac{{\left(\frac{{\left({s}_{1}\right)}^{2}}{{n}_{1}}+\frac{{\left({s}_{2}\right)}^{2}}{{n}_{2}}\right)}^{2}}{\left(\frac{1}{{n}_{1}–1}\right){\left(\frac{{\left({s}_{1}\right)}^{2}}{{n}_{1}}\right)}^{2}+\left(\frac{1}{{n}_{2}–1}\right){\left(\frac{{\left({s}_{2}\right)}^{2}}{{n}_{2}}\right)}^{2}}$

When both sample sizes n 1 and n 2 are five or larger, the Student's t approximation is very good. Notice that the sample variances ( s 1 ) 2 and ( s 2 ) 2 are not pooled. (If the question comes up, do not pool the variances.)

It is not necessary to compute this by hand. A calculator or computer easily computes it.

## Independent groups

The average amount of time boys and girls aged seven to 11 spend playing sports each day is believed to be the same. A study is done and data are collected, resulting in the data in [link] . Each populations has a normal distribution.

Sample Size Average Number of Hours Playing Sports Per Day Sample Standard Deviation
Girls 9 2 $0.866$
Boys 16 3.2 1.00

Is there a difference in the mean amount of time boys and girls aged seven to 11 play sports each day? Test at the 5% level of significance.

The population standard deviations are not known. Let g be the subscript for girls and b be the subscript for boys. Then, μ g is the population mean for girls and μ b is the population mean for boys. This is a test of two independent groups , two population means .

Random variable : ${\overline{X}}_{g}-{\overline{X}}_{b}$ = difference in the sample mean amount of time girls and boys play sports each day.
H 0 : μ g = μ b    H 0 : μ g μ b = 0
H a : μ g μ b    H a : μ g μ b ≠ 0
The words "the same" tell you H 0 has an "=". Since there are no other words to indicate H a , assume it says "is different." This is a two-tailed test.

Distribution for the test: Use t df where df is calculated using the df formula for independent groups, two population means. Using a calculator, df is approximately 18.8462. Do not pool the variances.

Calculate the p -value using a Student's t -distribution: p -value = 0.0054

Graph:

${s}_{g}=0.866$
${s}_{b}=1$
So, ${\overline{x}}_{g}–{\overline{x}}_{b}$ = 2 – 3.2 = –1.2
Half the p -value is below –1.2 and half is above 1.2.

Make a decision: Since α > p -value, reject H 0 . This means you reject μ g = μ b . The means are different.

Press STAT . Arrow over to TESTS and press 4:2-SampTTest . Arrow over to Stats and press ENTER . Arrow down and enter 2 for the first sample mean,  $\sqrt{0.866}$  for Sx1, 9 for n1, 3.2 for the second sample mean, 1 for Sx2, and 16 for n2. Arrow down to μ1: and arrow to does not equal μ2. Press ENTER . Arrow down to Pooled: and No . Press ENTER . Arrow down to Calculate and press ENTER . The p -value is p = 0.0054, the dfs are approximately 18.8462, and the test statistic is -3.14. Do the procedure again but instead of Calculate do Draw.

Conclusion: At the 5% level of significance, the sample data show there is sufficient evidence to conclude that the mean number of hours that girls and boys aged seven to 11 play sports per day is different (mean number of hours boys aged seven to 11 play sports per day is greater than the mean number of hours played by girls OR the mean number of hours girls aged seven to 11 play sports per day is greater than the mean number of hours played by boys).

Frequency find questions
What is nominal variable
Write short notes on, nominal variable, ordinal variable, internal variable, ratio variable.
olusola
P( /x-50/ less than or equal to 5 ) where mean =52 and Variance =25
how I get the mcq
the exploration and analysis of large data to discover meaningful patterns and rules
Hussein
how do we calculate the median
f(x)=cx(1-x)^4 as x range 4rm 0<=x<=1. Can someone pls help me find d constant C. By integration only..
uses of statistics in Local Government
Hi
Tamuno
hello
Saleema
Atul
District statistical officer
Atul
statistical services
Atul
Please is this part of the IMT program
Tamuno
testing of drugs
Shambhavi
hii 2
Qamar-ul-
Tamuno
Hello every one
Okoi
sample survey is done by local government in each and every field.
syeda
statistics is used in almost every government organisations such as health department, economic department, census, weather forecasting fields
raghavendra
that's true
syeda
statistics is one of the tool that represents the falling and rising of any cases in one sheet either that is in population census whether forecast as well as economic growth
statistic is a technique, and statistics is a subject
syeda
Probability tells you the likelihood of an event happening. ... The higher the probability, the more likely it is to happen. Probability is a number or fraction between 0 and 1. A probability of 1 means something will always happen, and a probability of 0 means something will never happen...
Saying it's a number between zero and one means it is a fraction so you could remove "or fraction" from you definition.
Carlos
wouldn't be correct to remove fractions, saying a number is justified as probabilities can also be decimals between 0 and 1.
Denzel
Saying "a number" will include it being a decimal which are themselves fractions in another form.
Carlos
I will simply say a probability is a number in the range zero to one, inclusive.
Carlos
f#\$
Carlos
How to delete an entry? This last one was a pocket print.
Carlos
what is probability
chance of occurrence
Sikander
what is data
raw facts and figures
Sikander
information of any kind
Tahir
What is Statistic
what statistical analysis can i run on growth and yield of spinach.
guillio
format of the frequency distribution table
henry
what is pearson correlation coefficient indicates?
Eticha
Statistic is the mean of the sample.
Raman
can anyone determine the value of c and the covariance and correlation for the joint probability density function Fxy(x,y)=c over the range 0<x<5,0<y,and x-1<y<x-1.
Nuhu
what actually is the definition of range
I need social statistics materials
Chinedu
the range of a set of data is the difference between the largest and smallest values
La
I need more explanation about cluster sampling
Hafsat
write the set of old number that are greater than or equal to minutes 7 butl less than 5 in both of the set notation