April 6, 2016

Deeper Knowledge: 2 stage least squares model – Instrumental variable method and justification

This time i wanted to dig into some econometrics and give a brief and simple explanation of Instrumental variable method. Also i am writing this post in English so all of people who need this will be able to understand the concept of Instrumental variable. So without further due, let’s get to it.
In this assignment, relationship between woman’s wage (lnwage) as dependent variable and education (educ), experience (exper) as explanatory variables has been analyzed. Data has a sample of 428 women and has been downloaded from associated website of “Principles of economics” by R.C. Hill, W.E. Griffiths.
Table 1.
Firstly, simple OLS regression model was constructed for further observation. It is observable from table1 that one additional year of education can increase wage by 10.9%. Also additional year of experience can raise the wage by 1.5%. But this is quite unrealistic result as 10.9% increase is hard to come by in practice.
Therefore, it is possible that we have a problem of endogeneity as variable educ might be an endogenous variable. It is quite possible that in this simple OLS model, we have omitted some variables that has effect on dependent variable as well as correlated with educ.
Hence, new Instrumental Variable procedure has been applied as mother’s education (mothereduc) as instrumental variable. Mother’s education has no direct effect on daughter’s wage so it’s not explanatory variable of dependent variable. Moreover, mother’s education should be correlated with daughter’s education level as educated mothers tend to provide at least same education level for their children. I assumed mother’s education is not correlated with error term.
The LS estimate is as following: lnwage = -0.4 + 0.109*educ + 0.015*exper
2SLS : educ= c + x1*mothereduc + x2*exper
Main implication of this method is shown above. After 2nd stage least estimation, educ must be substituted back to first equation. Then the final result can be shown.
Table 2.
Now, 2SLS regression shows us that additional year education year can increase the wage by 5.4% which is more realistic than previous result. Now in order to check whether instrumental variable should be included or not, let’s run Hausman test for the exogeneity of variable educ in the regression discussed in the second model.
TSLS, using observations 1-428
Dependent variable: LNWAGE
Hausman test -
            Null hypothesis: OLS estimates are consistent            Asymptotic test statistic: Chi-square(1) = 2.72781            with p-value = 0.0986144
Weak instrument test -
            First-stage F-statistic (1, 425) = 75.2245
According to Hausman test, p-value is higher than 0.05 therefore, we cannot reject null hypothesis at 5% significance level. Therefore, we don’t have endogeneity at 5% significance level and the first OLS model is more suited for estimation. As of Weak Instrument test, F-statistics is beyond 10%. Hence, we reject the hypothesis that the (external) instrument mothereduc is not weak instrument.
Justification of using Instrumental Variable method
We might encounter some problems regarding endogeneity while estimating a model. Endogeneity occurs when explanatory variable correlates with error term. Basically it means cov(e,x)clip_image005[4]0 and x is endogenous. One of the most common examples is omitted variable. For example:
Let’s consider simple regression: y = C + b1*x1 + b2*x2 + e
So error term “e” covers all other factors than x1 and x2. But what if there was another variable x3 which has an effect on y. Thus, now error term equals as following: e = b3*x3 + v
Now full equation becomes: y = C + b1*x1 + b2*x2 + b3*x3 + v
Danger of this event is that LS estimates might over estimate the effect of x1 and x2 on y. Also if explanatory variable is approximate variable, there might be an endogeneity. It is quite difficult to exactly point out ability of worker but to approximate it.
clip_image006[4]
Let’s take a look at another regression model. In this case, data contains hourly wages, education, experience, father and mother education level, as well as host of other potential explanatory variables. This data is from downloaded from associating website of textbook “
Introductory Econometrics: A Modern Approach
” by Wooldridge.
Model 2: OLS, using observations 1-935
Dependent variable: lwage
Coefficient Std. Error t-ratio p-value
const 5.50271 0.112037 49.1151 <0.0001 ***
educ 0.077782 0.00657687 11.8266 <0.0001 ***
exper 0.0197768 0.00330251 5.9884 <0.0001 ***
Mean dependent var 6.779004 S.D. dependent var 0.421144
Sum squared resid 143.9786 S.E. of regression 0.393044
R-squared 0.130859 Adjusted R-squared 0.128994
F(2, 932) 70.16174 P-value(F) 4.13e-29
Log-likelihood −452.0704 Akaike criterion 910.1407
Schwarz criterion 924.6624 Hannan-Quinn 915.6779
According to the result, additional year of education can increase wage by 7.7%. Also additional year of experience can raise the wage by 1.9%.
By following same procedures which applied for previous models, simple LS result is shown above. There are likely a number of omitted factors which are correlated with educ. This means that educ is likely endogenous, causing it to be both biased and inconsistent. Hence, it’s necessary to find potential instrumental variables that can be applied. As same as previous models, parent’s education level can be correlated with children’s education as well as can have an effect on dependent variable.
Therefore, both father and mother’s education level has been chosen as instrumental variables for educ. Then TSLS model has been chosen to calculate regression.
Model 3: TSLS, using observations 1-932 (n = 722)
Missing or incomplete observations dropped: 210
Dependent variable: lwage
Instrumented: educ
Instruments: const meduc feduc exper
Coefficient Std. Error z p-value
const 4.42943 0.315519 14.0386 <0.0001 ***
educ 0.142298 0.0191804 7.4189 <0.0001 ***
exper 0.0376059 0.00572293 6.5711 <0.0001 ***
Mean dependent var 6.799923 S.D. dependent var 0.419385
Sum squared resid 121.2749 S.E. of regression 0.410696
R-squared 0.139428 Adjusted R-squared 0.137035
F(2, 719) 28.33596 P-value(F) 1.43e-12
Hausman test -
          Null hypothesis: OLS estimates are consistent
         
Asymptotic test statistic: Chi-square(1) = 15.4483
         
with p-value = 8.47936e-005
Sargan over-identification test -
          Null hypothesis: all instruments are valid
         
Test statistic: LM = 0.0420364
         
with p-value = P(Chi-square(1) > 0.0420364) = 0.83755
Weak instrument test -
          First-stage F-statistic (2, 718) = 67.2316
As p-value for Hausman test is very small, we can reject null hypothesis at 5% significance level. Therefore, we have endogeneity at 5% significance level and the first OLS model is not suitable for estimation and TSLS model is more suited for estimation. As of Weak Instrument test, F-statistics is beyond 10%. Hence, we reject the hypothesis that the (external) instruments father and mother’s education are strong instruments. Moreover, Sargan test shows us that all instruments are valid.
In this case, it was better to use IV methods to estimate the model.

DULGUUN.G

No comments:

Post a Comment

Related Posts with Thumbnails