# 2.3 Sample selectivity bias  (Page 4/6)

We are interested in only a subset of these data. Table 2 reports the definitions of variables that are relevant for our analysis. We can get further insight into the data set using the summarize command. Table 3 reports the summary statistics for the data set.

 Variable name Definition country County of residence (categorical variable equal to 0, 1, ..., 9) age Age of the woman education Number of years of education of the woman married Dummy variable equal to 1 if the woman is married and 0 otherwise children Number of children that the woman has in their household wage Hourly wage rate of the woman lw Natural logarithm of hourly wage rate work Dummy variable equal to 1 if the individual is in the workforce and 0 otherwise
 Variable Obs Mean Std. Dev Min Max Age 2000 36.208 8.28656 20 59 education 2000 13.084 3.045912 10 20 married 2000 .6705 .4701492 0 1 children 2000 1.6445 1.398963 0 5 wage 1343 23.69217 6.305374 5.88497 45.80979 lw 1343 3.126703 .2865111 1.772402 3.824498 work 2000 .6715 .4697852 0 1

We are interested in modeling two things: (1) the decision of the woman to enter the labor force and (2) determinants of the female wage rate. It might be reasonable to assume that the decision to enter the labor force by a woman is a function of age, marital status, the number of children, and her level of education. Also, the wage rate a woman earns should be a function of her age and education.

## The decision to enter the labor force

We can use a probit regression to model the decision of a woman to enter the labor force. The results of this estimation are reported in Table 4. However, we can use the predict command to produce some results that we can use to be sure that we understand what the regression results mean. In particular, type in the following two commands:

.predict zbhat, xb

.predict phat, p

These two commands will predict (1) the linear prediction (zbhat) and (2) the predicted probability that the woman will be in the workforce (phat). Table 5 reports the values of these two variables for observations 1 through 10.

 . probit work age education married children Iteration 0: log likelihood = -1266.2225 Iteration 4: log likelihood = -1027.0616 Probit estimates Number of obs = 2000 LR chi2(4) = 478.32 Prob>chi2 = 0.0000 Log likelihood = -1027.0616 Pseudo R2 = 0.1889 work Coef. Std. Err. z P>z [95% Conf. Interval] age .0347211 .0042293 8.21 0.000 .0264318 .0430105 education .0583645 .0109742 5.32 0.000 .0368555 .0798735 married .4308575 .074208 5.81 0.000 .2854125 .5763025 children .4473249 .0287417 15.56 0.000 .3909922 .5036576 _cons -2.467365 .1925635 -12.81 0.000 -2.844782 -2.089948
 Observation zbhat phat 1 -0.68900 0.24541 2 -0.20290 0.41961 3 -0.48067 0.31538 4 -0.16818 0.43322 5 0.34859 0.63630 6 0.58758 0.72159 7 0.97357 0.83486 8 0.45978 0.67716 9 0.01799 0.50718 10 0.32628 0.62790

