# 10.1 Two population means with unknown standard deviations

 Page 1 / 24
1. The two independent samples are simple random samples from two distinct populations.
2. For the two distinct populations:
• if the sample sizes are small, the distributions are important (should be normal)
• if the sample sizes are large, the distributions are not important (need not be normal)

The test comparing two independent population means with unknown and possibly unequal population standard deviations is called the Aspin-Welch t-test. The degrees of freedom formula was developed by Aspin-Welch.

The comparison of two population means is very common. A difference between the two samples depends on both the means and the standard deviations. Very different means can occur by chance if there is great variation among the individual samples. In order to account for the variation, we take the difference of the sample means, ${\overline{X}}_{1}$ ${\overline{X}}_{2}$ , and divide by the standard error in order to standardize the difference. The result is a t-score test statistic.

Because we do not know the population standard deviations, we estimate them using the two sample standard deviations from our independent samples. For the hypothesis test, we calculate the estimated standard deviation, or standard error , of the difference in sample means , ${\overline{X}}_{1}$ ${\overline{X}}_{2}$ .

## The standard error is:

$\sqrt{\frac{\left({s}_{1}{\right)}^{2}}{{n}_{1}}+\frac{\left({s}_{2}{\right)}^{2}}{{n}_{2}}}$

The test statistic ( t -score) is calculated as follows:

$\frac{\text{(}{\overline{x}}_{1}–{\overline{x}}_{2}\text{)}–\text{(}{\mu }_{1}–{\mu }_{2}\text{)}}{\sqrt{\frac{{\text{(}{s}_{1}\text{)}}^{2}}{{n}_{1}}+\frac{{\text{(}{s}_{2}\text{)}}^{2}}{{n}_{2}}}}$

## Where:

• s 1 and s 2 , the sample standard deviations, are estimates of σ 1 and σ 2 , respectively.
• σ 1 and σ 1 are the unknown population standard deviations.
• ${\overline{x}}_{1}$ and ${\overline{x}}_{2}$ are the sample means. μ 1 and μ 2 are the population means.

The number of degrees of freedom ( df ) requires a somewhat complicated calculation. However, a computer or calculator calculates it easily. The df are not always a whole number. The test statistic calculated previously is approximated by the Student's t -distribution with df as follows:

## Degrees of freedom

$df=\frac{{\left(\frac{{\left({s}_{1}\right)}^{2}}{{n}_{1}}+\frac{{\left({s}_{2}\right)}^{2}}{{n}_{2}}\right)}^{2}}{\left(\frac{1}{{n}_{1}–1}\right){\left(\frac{{\left({s}_{1}\right)}^{2}}{{n}_{1}}\right)}^{2}+\left(\frac{1}{{n}_{2}–1}\right){\left(\frac{{\left({s}_{2}\right)}^{2}}{{n}_{2}}\right)}^{2}}$

When both sample sizes n 1 and n 2 are five or larger, the Student's t approximation is very good. Notice that the sample variances ( s 1 ) 2 and ( s 2 ) 2 are not pooled. (If the question comes up, do not pool the variances.)

It is not necessary to compute this by hand. A calculator or computer easily computes it.

## Independent groups

The average amount of time boys and girls aged seven to 11 spend playing sports each day is believed to be the same. A study is done and data are collected, resulting in the data in [link] . Each populations has a normal distribution.

Sample Size Average Number of Hours Playing Sports Per Day Sample Standard Deviation
Girls 9 2 $0.866$
Boys 16 3.2 1.00

Is there a difference in the mean amount of time boys and girls aged seven to 11 play sports each day? Test at the 5% level of significance.

The population standard deviations are not known. Let g be the subscript for girls and b be the subscript for boys. Then, μ g is the population mean for girls and μ b is the population mean for boys. This is a test of two independent groups , two population means .

Random variable : ${\overline{X}}_{g}-{\overline{X}}_{b}$ = difference in the sample mean amount of time girls and boys play sports each day.
H 0 : μ g = μ b    H 0 : μ g μ b = 0
H a : μ g μ b    H a : μ g μ b ≠ 0
The words "the same" tell you H 0 has an "=". Since there are no other words to indicate H a , assume it says "is different." This is a two-tailed test.

Distribution for the test: Use t df where df is calculated using the df formula for independent groups, two population means. Using a calculator, df is approximately 18.8462. Do not pool the variances.

Calculate the p -value using a Student's t -distribution: p -value = 0.0054

Graph:

${s}_{g}=0.866$
${s}_{b}=1$
So, ${\overline{x}}_{g}–{\overline{x}}_{b}$ = 2 – 3.2 = –1.2
Half the p -value is below –1.2 and half is above 1.2.

Make a decision: Since α > p -value, reject H 0 . This means you reject μ g = μ b . The means are different.

Press STAT . Arrow over to TESTS and press 4:2-SampTTest . Arrow over to Stats and press ENTER . Arrow down and enter 2 for the first sample mean,  $\sqrt{0.866}$  for Sx1, 9 for n1, 3.2 for the second sample mean, 1 for Sx2, and 16 for n2. Arrow down to μ1: and arrow to does not equal μ2. Press ENTER . Arrow down to Pooled: and No . Press ENTER . Arrow down to Calculate and press ENTER . The p -value is p = 0.0054, the dfs are approximately 18.8462, and the test statistic is -3.14. Do the procedure again but instead of Calculate do Draw.

Conclusion: At the 5% level of significance, the sample data show there is sufficient evidence to conclude that the mean number of hours that girls and boys aged seven to 11 play sports per day is different (mean number of hours boys aged seven to 11 play sports per day is greater than the mean number of hours played by girls OR the mean number of hours girls aged seven to 11 play sports per day is greater than the mean number of hours played by boys).

what is standard deviation?
It is the measure of the variation of certain values from the Mean (Center) of a frequency distribution of sample values for a particular Variable.
Dominic
Yeah....the simplest one
IRFAN
what is the number of x
10
Elicia
Javed Arif
Jawed
how will you know if a group of data set is a sample or population
population is the whole set and the sample is the subset of population.
umair
if the data set is drawn out of a larger set it is a sample and if it is itself the whole complete set it can be treated as population.
Bhavika
hello everyone if I have the data set which contains measurements of each part during 10 years, may I say that it's the population or it's still a sample because it doesn't contain my measurements in the future? thanks
Alexander
Pls I hv a problem on t test is there anyone who can help?
Peggy
Dominic
Bhavika is right
Dominic
what is the problem peggy?
Bhavika
hi
Sandeep
Hello
hi
Bhavika
hii Bhavika
Dar
Hi eny population has a special definition. if that data set had all of characteristics of definition, that is population. otherwise that is a sample
Hoshyar
three coins are tossed. find the probability of no head
three coins are tossed consecutively or what ?
umair
umair
or .125 is the probability of getting no head when 3 coins are tossed
umair
🤣🤣🤣
Simone
what is two tailed test
if the diameter will be greater than 3 cm then the bullet will not fit in the barrel of the gun so you are bothered for both the sides.
umair
in this test you are worried on both the ends
umair
lets say you are designing a bullet for thw gun od diameter equals 3cm.if the diameter of the bullet is less than 3 cm then you wont be able to shoot it
umair
In order to apply weddles rule for numerical integration what is minimum number of ordinates
excuse me?
Gabriel
why?
didn't understand the question though.
Gabriel
which question? ?
We have rules of numerical integration like Trapezoidal rule, Simpson's 1/3 and 3/8 rules, Boole's rule and Weddle rule for n =1,2,3,4 and 6 but for n=5?
John
geometric mean of two numbers 4 and 16 is:
10
umair
really
iphone
quartile deviation of 8 8 8 is:
iphone
sorry 8 is the geometric mean of 4,16
umair
quartile deviation of 8 8 8 is
iphone
can you please expalin the whole question ?
umair
mcq
iphone
h
iphone
can you please post the picture of that ?
umair
how
iphone
hello
John
10 now
John
how to find out the value
can you be more specific ?
umair
yes
KrishnaReddy
what is the difference between inferential and descriptive statistics
descriptive statistics gives you the result on the the data like you can calculate various things like variance,mean,median etc. however, inferential stats is involved in prediction of future trends using the previous stored data.
umair
if you need more help i am up for the help.
umair
Thanks a lot
Anjali
Inferential Statistics involves drawing conclusions on a population based on analysis of a sample. Descriptive statistics summarises or describes your current data as numerical calculations or graphs.
fred
my pleasure😊. Helping others offers me satisfaction 😊
umair
for poisson distribution mean............variance.
both are equal to mu
Faizan
mean=variance
Faizan
what is a variable
something that changes
Festus
why we only calculate 4 moment of mean? asked in papers.
why we only 4 moment of mean ? asked in BA exam
Faizan
Hello, can you please share the possible questions that are likely to be examined under the topic: regression and correlation analysis.
Refiloe
for normal distribution mean is 2 & variance is 4 find mu 4?
repeat quastion again
Yusuf
find mu 4. it can be wrong but want to prove how.
Faizan
for a normal distribution if mu 4 is 12 then find mu 3?
Question hi wrong ha
Tahir
ye BA mcqs me aya he teen he. 2dafa aya he
Faizan
if X is normally distributed. (n,b). then its mean deviation is?
Faizan
The answer is zero, because all odd ordered central moments of a normal distribution are Zero.
nikita
which question is zero
Faizan
sorry it is (5,16) in place of (n,b)
Faizan
I got. thanks. it is zero.
Faizan
a random variable having binomial distribution is?
Bokaho