<< Chapter < Page Chapter >> Page >

An alternative approach to the sample selectivity problem is to use a maximum likelihood estimator . Heckman (1974) originally suggested estimating the parameters of the model by maximizing the average log likelihood function:

L = 1 N i = 1 N { d i ln [ ( z γ ) ϕ ε ν ( y i x i β ) d ν ] + ( 1 d i ) [ ln ( z γ ) ϕ ε ν ( ε , ν ) d ε d ν ] } , MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbiqaaWGbcaWGmbGaeyypa0ZaaSaaaeaacaaIXaaabaGaamOtaaaadaaeWbqaamaacmaabaGaamizamaaBaaaleaacaWGPbaabeaakiGacYgacaGGUbWaamWaaeaadaWdXaqaaiabew9aMnaaBaaaleaacqaH1oqzcqaH9oGBaeqaaOWaaeWaaeaacaWG5bWaaSbaaSqaaiaadMgaaeqaaOGaeyOeI0IaaCiEamaaBaaaleaacaWGPbaabeaakmaaCaaaleqabaGccWaGGBOmGikaaiaahk7aaiaawIcacaGLPaaacaWGKbGaeqyVd4galeaacqGHsisldaqadaqaaiqahQhagaqbaiaaho7aaiaawIcacaGLPaaaaeaacqGHEisPa0Gaey4kIipaaOGaay5waiaaw2faaiabgUcaRmaabmaabaGaaGymaiabgkHiTiaadsgadaWgaaWcbaGaamyAaaqabaaakiaawIcacaGLPaaadaWadaqaaiGacYgacaGGUbWaa8qmaeaadaWdXaqaaiabew9aMnaaBaaaleaacqaH1oqzcqaH9oGBaeqaaOWaaeWaaeaacqaH1oqzcaGGSaGaeqyVd4gacaGLOaGaayzkaaGaamizaiabew7aLjaadsgacqaH9oGBaSqaaiabgkHiTiabg6HiLcqaaiabg6HiLcqdcqGHRiI8aaWcbaGaeyOeI0YaaeWaaeaaceWH6bGbauaacaWHZoaacaGLOaGaayzkaaaabaGaeyOhIukaniabgUIiYdaakiaawUfacaGLDbaaaiaawUhacaGL9baacaGGSaaaleaacaWGPbGaeyypa0JaaGymaaqaaiaad6eaa0GaeyyeIuoaaaa@8C01@

where ϕ ε ν MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqy1dy2aaSbaaSqaaiabew7aLjabe27aUbqabaaaaa@3B47@ is the probability density function for the bivariate normal distribution. Fortunately, Stata offers a single command for calculating either the two-step or the maximum likelihood estimators.

Estimation in Stata

Estimation of the two versions of the Heckman sample selectivity bias models is straightforward in Stata . The command is:

.heckman depvar [varlist], select(varlist_s) [twostep]

or

.heckman depvar [varlist], select(depvar_s = varlist_s) [twostep]

The syntax for maximum-likelihood estimates is:

.heckman depvar [varlist] [weight][if exp] [in range], select([depvar_s =] varlist_s [, offset(varname) noconstant]) [ robust cluster(varname) score(newvarlist|stub*) nshazard(newvarname) mills(newvarname) offset(varname) noconstant constraints(numlist) first noskip level(#) iterate(0) nolog maximize_options ]

The predict command has these options, among others:

xb , the default, calculates the linear predictions from the underlying regression equation.

ycond calculates the expected value of the dependent variable conditional on the dependent variable being observed/selected; E(y | y observed).

yexpected calculates the expected value of the dependent variable (y*), where that value is taken to be 0 when it is expected to be unobserved; y* = P(y observed) * E(y | y observed). The assumption of 0 is valid for many cases where nonselection implies non-participation (e.g., unobserved wage levels, insurance claims from those who are uninsured, etc.) but may be inappropriate for some problems (e.g., unobserved disease incidence).

Examples of these two commands are:

. heckman wage educ age, select(married children educ age)

. predict yhat

These two command would use the maximum likelihood estimate of the equations (1) wage as a function of education and age using a selection equation that used marital status, number of children, education level, and age to explain which individuals are participating in the labor force. The help file in Stata provides additional information on the structure of the Heckman command and is well worth printing out if you are dealing with a sample selectivity bias problem.

Example from Stata

We will illustrate various issues of selection bias using the data set available from the Stata site. Retrieve the data set by entering:

. use http://www.stata-press.com/data/imeus/womenwk, clear

This data set has 2,000 observations of 15 variables. We can use the describe command (.describe) to get a brief description of the data set:

Description of variables included in the data set from http://www.stata-press.com/data/imeus/womenwk.
obs: 2,000
vars: 15 9 Nov 2004 20:23
size: 142,000 (86.5% of memory free)
Variable Name Storage Type Display Format Value Label Variable Label
c1 double %10.0g
c2 double %10.0g
u double %10.0g
v (7,2) %10.0g
country float %9.0g
age int %8.0g
education int %8.0g
married byte %8.0g
children int %8.0g
select float %9.0g
wageful float %9.0g
wage float %9.0g
lw float %9.0g
work float %9.0g
lwf float %9.0g

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Econometrics for honors students. OpenStax CNX. Jul 20, 2010 Download for free at http://cnx.org/content/col11208/1.2
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Econometrics for honors students' conversation and receive update notifications?

Ask