This module provides an overview of Testing the Significance of the Correlation Coefficient for Roberta Bloom's Custom Collection of Collaborative Statistics col10617. It has been modified from the original module m17077, Facts About the Correlation Coefficient for Linear Regression, which is part of Collaborative Statistics collection (col10522) by Barbara Illowsky and Susan Dean. The test of significance is presented as a hypothesis test both using the p-value and using a table of critical values. Some of the material from the original module m17077 has been moved to module m33269 in Bloom's custom collection of Collaborative Statistics.
Testing the significance of the correlation coefficient
The correlation coefficient, r, tells us about the strength of the linear relationship between x and y. However, the reliability of the linear model also depends on how many observed data points are in the sample. We need to look at both the value of the correlation coefficient r and the sample size n, together.
We perform a hypothesis test of the
"significance of the correlation coefficient" to decide whether the linear relationship in the sample data is strong enough and reliable enough to use to model the relationship in the population.
The sample data is used to compute r, the correlation coefficient for the sample. IF we had data for the entire population, we could find the population correlation coefficient. But because we only have sample data, we can not calculate the population correlation coefficient. The sample correlation coefficient, r, is our estimate of the unknown population correlation coefficient.
- The symbol for the population correlation coefficient is ρ, the Greek letter "rho".
- ρ = population correlation coefficient (unknown)
- r = sample correlation coefficient (known; calculated from sample data)
The hypothesis test lets us decide whether the value of the population correlation coefficient ρ is "close to 0" or "significantly different from 0". We decide this based on the sample correlation coefficient r and the sample size n.
If the test concludes that the correlation coefficient is significantly different from 0, we say that the correlation coefficient is "significant".
- Conclusion: "The correlation coefficient IS SIGNIFICANT"
- What the conclusion means:
We believe that there is a significant linear relationship between x and y.We can use the regression line to model the linear relationship between x and y in the population.
If the test concludes that the correlation coefficient is not significantly different from 0 (it is close to 0), we say that correlation coefficient is "not significant".
- Conclusion: "The correlation coefficient IS NOT SIGNIFICANT."
- What the conclusion means:
We do NOT believe that there is a significant linear relationship between x and y.Therefore we can NOT use the regression line to model a linear relationship between x and y in the population.
- If
$r$ is significant and the scatter plot shows a reasonable linear trend, the line can be used to predict the value of
$y$ for values of
$x$ that are within the domain of observed
$x$ values.
- If
$r$ is not significant OR if the scatter plot does not show a reasonable linear trend, the line should not be used for prediction.
- If
$r$ is significant and if the scatter plot shows a reasonable linear trend, the line may NOT be appropriate or reliable for prediction OUTSIDE the domain of observed
$x$ values in the data.