<< Chapter < Page Chapter >> Page >
  1. The two independent samples are simple random samples from two distinct populations.
  2. For the two distinct populations:
    • if the sample sizes are small, the distributions are important (should be normal)
    • if the sample sizes are large, the distributions are not important (need not be normal)

The test comparing two independent population means with unknown and possibly unequal population standard deviations is called the Aspin-Welch t-test. The degrees of freedom formula was developed by Aspin-Welch.

The comparison of two population means is very common. A difference between the two samples depends on both the means and the standard deviations. Very different means can occur by chance if there is great variation among the individual samples. In order to account for the variation, we take the difference of the sample means, X ¯ 1 X ¯ 2 , and divide by the standard error in order to standardize the difference. The result is a t-score test statistic.

Because we do not know the population standard deviations, we estimate them using the two sample standard deviations from our independent samples. For the hypothesis test, we calculate the estimated standard deviation, or standard error , of the difference in sample means , X ¯ 1 X ¯ 2 .

The standard error is:

( s 1 ) 2 n 1 + ( s 2 ) 2 n 2

The test statistic ( t -score) is calculated as follows:

( x ¯ 1 x ¯ 2 ) ( μ 1 μ 2 ) ( s 1 ) 2 n 1 + ( s 2 ) 2 n 2

    Where:

  • s 1 and s 2 , the sample standard deviations, are estimates of σ 1 and σ 2 , respectively.
  • σ 1 and σ 1 are the unknown population standard deviations.
  • x ¯ 1 and x ¯ 2 are the sample means. μ 1 and μ 2 are the population means.

The number of degrees of freedom ( df ) requires a somewhat complicated calculation. However, a computer or calculator calculates it easily. The df are not always a whole number. The test statistic calculated previously is approximated by the Student's t -distribution with df as follows:

Degrees of freedom

d f = ( ( s 1 ) 2 n 1 + ( s 2 ) 2 n 2 ) 2 ( 1 n 1 1 ) ( ( s 1 ) 2 n 1 ) 2 + ( 1 n 2 1 ) ( ( s 2 ) 2 n 2 ) 2

When both sample sizes n 1 and n 2 are five or larger, the Student's t approximation is very good. Notice that the sample variances ( s 1 ) 2 and ( s 2 ) 2 are not pooled. (If the question comes up, do not pool the variances.)

It is not necessary to compute this by hand. A calculator or computer easily computes it.

Independent groups

The average amount of time boys and girls aged seven to 11 spend playing sports each day is believed to be the same. A study is done and data are collected, resulting in the data in [link] . Each populations has a normal distribution.

Sample Size Average Number of Hours Playing Sports Per Day Sample Standard Deviation
Girls 9 2 0.866
Boys 16 3.2 1.00

Is there a difference in the mean amount of time boys and girls aged seven to 11 play sports each day? Test at the 5% level of significance.

The population standard deviations are not known. Let g be the subscript for girls and b be the subscript for boys. Then, μ g is the population mean for girls and μ b is the population mean for boys. This is a test of two independent groups , two population means .

Random variable : X ¯ g X ¯ b = difference in the sample mean amount of time girls and boys play sports each day.
H 0 : μ g = μ b    H 0 : μ g μ b = 0
H a : μ g μ b    H a : μ g μ b ≠ 0
The words "the same" tell you H 0 has an "=". Since there are no other words to indicate H a , assume it says "is different." This is a two-tailed test.

Distribution for the test: Use t df where df is calculated using the df formula for independent groups, two population means. Using a calculator, df is approximately 18.8462. Do not pool the variances.

Calculate the p -value using a Student's t -distribution: p -value = 0.0054

Graph:

This is a normal distribution curve representing the difference in the average amount of time girls and boys play sports all day. The mean is equal to zero, and the values -1.2, 0, and 1.2 are labeled on the horizontal axis. Two vertical lines extend from -1.2 and 1.2 to the curve. The region to the left of x = -1.2 and the region to the right of x = 1.2 are shaded to represent the p-value. The area of each region is 0.0028.


s g = 0.866
s b = 1
So, x ¯ g x ¯ b = 2 – 3.2 = –1.2
Half the p -value is below –1.2 and half is above 1.2.

Make a decision: Since α > p -value, reject H 0 . This means you reject μ g = μ b . The means are different.

Press STAT . Arrow over to TESTS and press 4:2-SampTTest . Arrow over to Stats and press ENTER . Arrow down and enter 2 for the first sample mean, 0.866 for Sx1, 9 for n1, 3.2 for the second sample mean, 1 for Sx2, and 16 for n2. Arrow down to μ1: and arrow to does not equal μ2. Press ENTER . Arrow down to Pooled: and No . Press ENTER . Arrow down to Calculate and press ENTER . The p -value is p = 0.0054, the dfs are approximately 18.8462, and the test statistic is -3.14. Do the procedure again but instead of Calculate do Draw.

Conclusion: At the 5% level of significance, the sample data show there is sufficient evidence to conclude that the mean number of hours that girls and boys aged seven to 11 play sports per day is different (mean number of hours boys aged seven to 11 play sports per day is greater than the mean number of hours played by girls OR the mean number of hours girls aged seven to 11 play sports per day is greater than the mean number of hours played by boys).

Got questions? Get instant answers now!

Questions & Answers

In a survey of 100 stocks on NASDAQ, the average percent increase for the past year was 9% for NASDAQ stocks.
Noor Reply
what is primary data and it's various types of collection of data?
Xaib Reply
If a random variable X takes only two values -2 and 1 such that 2 P[X=- 2]=P[X=1]=P, then find variance of x.
Dikshita Reply
2.5
Anirban
how!
Alif
the mean of random variable following binomial distribution is 10. the numbers of trails are 30. What the approximate value of variance?
AMIT Reply
what is chis square
Remelyn Reply
A chi-square (χ2) statistic is a test that measures how a model compares to actual observed data.
kalim
descriptive statistics basic
Sangeetha
Hi Friends
Sangeetha
tamil people yaaraavathu irukkingala
Sangeetha
what is the median of the following set of number:3,3,3,4,5,6,7,8,8,9?
Sur Reply
3
Mbemah
5.6
Alekya
(5+6)/2=5.5555.......=5.6
Lyrical
(5+6)/2=5.5 that is the median
moniba
hi friends am new here, hope am welcome
nathan
hi friends am new here, hope am welcome
nathan
1.5
nathan
differences between Intervals and Ratio measurement scales
VIP Reply
interval is the distinct from a starting point to the peak of a certain distance
Mbemah
the taking of a certain range accurately in a divisional way
Mbemah
i need the solution for exercise sums
Siva Reply
descriptive statistics sums I need
Sangeetha
statistics discovered?
Kaleem Reply
internet
Kamranali
dev OS
Virendra
searching for motivation to learn stat
Nija
purpose of of statistics
mukesh
ya..
mukesh
tamil people yaraavathu inrukkingala
Sangeetha
Odisha ru kau bhai mane achha?
Hemanta
I want to learn my MSC by statistics who can help me ?
Abebe
@Avirup interest person
Abebe
good morning
Sangeetha
Dr good morning
mukhtar
in statistical inferences we use population parameter to infer sample statistics
anupama Reply
Yes
lukman
how do we calculate decile
Drix Reply
how do we calculate class boundaries
Syed
The lower class boundary is found by subtracting 0.5 units from the lowerclass limit and the upper class boundary is found by adding 0.5 units to the upper class limit. The difference between the upper and lowerboundaries of any class.
Ekene
The lower class boundary is found by subtracting 0.5 units from the lowerclass limit and the upper class boundary is found by adding 0.5 units to the upper class limit. The difference between the upper and lowerboundaries of any class.
Ekene
complete definition plz
Zarifa Reply
what are the advantages and disadvantages of primary and secondary data
Elisha Reply
advantage of primary data 1) specific 2) Accurate 3)ownership 4)up to date information 5) control Disadvantage of primary data 1) Expensive 2) time consuming 3) feasibility.
Xaib
the average increase for all NASDAQ stocks is the
Technical Reply
formula of poisson probability distribution
Ch Reply

Get the best Introductory statistics course in your pocket!





Source:  OpenStax, Introductory statistics. OpenStax CNX. May 06, 2016 Download for free at http://legacy.cnx.org/content/col11562/1.18
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Introductory statistics' conversation and receive update notifications?

Ask