<< Chapter < Page | Chapter >> Page > |
Looking at the third panel of output we can write the equation as:
where b _{0} is the intercept, b _{1} is the estimated coefficient on price, and b _{2} is the estimated coefficient on income and e is the error term. The equation is written in Roman letters indicating that these are the estimated values and not the population parameters, β’s.
Our estimated equation is:
In other words, increasing the price by one dollar will lead to a 3.22-ton reduction in bricks purchased. Similarly, increasing per capita national income by one thousand dollars will lead to a 0.0167 ton increase in bricks consumed. This should make intuitive sense from economic theory, as price increases lead to less quantity demanded as people substitute away from bricks as a building material. As income increases the quantity demanded increases as people build more houses and office buildings and need more bricks. It is important to have a theory first that predicts the significance or at least the direction of the coefficients. Without a theory to test, this research tool is not much more helpful than the correlation coefficient we learned about earlier.
We cannot stop there. We need to first check whether our coefficients are statistically significant from zero. We set up a hypothesis of:
for both coefficients. Recall from earlier that we will not be able to definitively say that our estimated b _{1} is the actual real population β _{1} , but rather that with 1-α % confidence we cannot reject the null hypothesis that our calculated b _{1} is the true β _{1} significantly different from zero. The analyst is making a claim that the price causes an impact on quantity demanded. The claim is therefore in the alternative hypothesis. It will take a very small probability, 0.05 in this case, of being wrong, to overthrow the null hypothesis, the status quo, that β= 0. In all regression hypothesis tests the claim is in the alternative and the claim is that the theory has found a variable that has a significant impact on the Y variable.
The test statistic follows the standardizing formula:
The computer calculates this test statistic and presents it as “t stat”. You can find this value to the right of the standard error of the coefficient estimate. To reach a conclusion we compare this test statistic with the critical value of the student’s t at degrees of freedom n-2-1 = 16, and alpha=0.025 (95% confidence for a two-tailed test). Our t stat for b1 is -3.54 which is greater than 2.12 (the value you looked up in the t table), so we cannot accept our null hypothesis of no effect. We conclude that Price has a significant effect on Y because the calculated t value is in the tail. You can conduct the same test for b2. This time the calculated t statistic of the coefficient for the per capita income is not in the tail; t stat of 0.145. A t stat of 0.145 says that the estimated coefficient is only a very small number of standard deviations away from zero. We cannot reject the null hypothesis of no relationship; with these data we do not find that per capita income has a significant effect on the demand for bricks.
These tests tell us whether or not an individual coefficient is significantly different from zero, but does not address the overall quality of the model. We have seen that the R squared adjusted for degrees of freedom indicates this model with these two variable explains 70% of the variation in quantity of bricks demanded. We can also conduct a second test of the model taken as a whole. This is the F test presented in section 13.4 of this chapter. Because this is a multiple regression (more than one X), we use the F-test to determine if our coefficients collectively affect Y. The hypothesis is:
Under the ANOVA section of the output we find the calculated F statistic for this hypothesis. For this example the F statistic is 21.9. Again, comparing the calculated F statistic with the critical value given our desired level of confidence and the degrees of freedom will allow us to reach a conclusion. The critical value can be found in an F Table knowing that we have (n-3)=16 degrees of freedom and setting α/2 = 0.025 (95% confidence). The critical value is determined to be 2.12 and therefore we cannot accept the null hypotheses because the calculated F is in the tail. By not being able to accept the null hypotheses we conclude that this specification of this model has validity because at least one of the estimated coefficients is significantly different from zero. Since F-calculated is greater than F-critical, we can reject H _{0} , meaning that X _{1} and X _{2} together has a significant effect on Y.
An alternative way to reach this conclusion is to use the p-value comparison rule. The p-value is the area in the tail, given the calculated F statistic. In essence, the computer is finding the F value in the table for us. Under “significance F” is this probability. For this example it is calculated to be 2.6 x 10 ^{-5} , or 2.6 then moving the decimal five places to the left. (.000026) This is an almost infinitesimal level of probability and is certainly less than our alpha level at 95% confidence. Again we concluded that we cannot accept the null hypothesis.
The development of computing machinery and the software useful for academic and business research has made it possible to answer questions that just a few years ago we could not even formulate. Data is available in electronic format and can be moved into place for analysis in ways and at speeds that were unimaginable a decade ago. The sheer magnitude of data sets that can today be used for research and analysis gives us a higher quality of results than in days past. Even with only an Excel spreadsheet we can conduct very high level research. This section gives you the tools to conduct some of this very interesting research with the only limit being your imagination.
Notification Switch
Would you like to follow the 'Introductory statistics' conversation and receive update notifications?