Dulguun's Blog: Deeper Knowledge: 2 stage least squares model – Instrumental variable method and justification

This time i wanted to dig into some econometrics and give a brief and simple explanation of Instrumental variable method. Also i am writing this post in English so all of people who need this will be able to understand the concept of Instrumental variable. So without further due, let’s get to it.

In this assignment, relationship between woman’s wage (lnwage) as dependent variable and education (educ), experience (exper) as explanatory variables has been analyzed. Data has a sample of 428 women and has been downloaded from associated website of “Principles of economics” by R.C. Hill, W.E. Griffiths.

Table 1.

Firstly, simple OLS regression model was constructed for further observation. It is observable from table1 that one additional year of education can increase wage by 10.9%. Also additional year of experience can raise the wage by 1.5%. But this is quite unrealistic result as 10.9% increase is hard to come by in practice.

Therefore, it is possible that we have a problem of endogeneity as variable educ might be an endogenous variable. It is quite possible that in this simple OLS model, we have omitted some variables that has effect on dependent variable as well as correlated with educ.

Hence, new Instrumental Variable procedure has been applied as mother’s education (mothereduc) as instrumental variable. Mother’s education has no direct effect on daughter’s wage so it’s not explanatory variable of dependent variable. Moreover, mother’s education should be correlated with daughter’s education level as educated mothers tend to provide at least same education level for their children. I assumed mother’s education is not correlated with error term.

The LS estimate is as following: lnwage = -0.4 + 0.109*educ + 0.015*exper

2SLS : educ= c + x1*mothereduc + x2*exper

Main implication of this method is shown above. After 2^nd stage least estimation, educ must be substituted back to first equation. Then the final result can be shown.

Table 2.

Now, 2SLS regression shows us that additional year education year can increase the wage by 5.4% which is more realistic than previous result. Now in order to check whether instrumental variable should be included or not, let’s run Hausman test for the exogeneity of variable educ in the regression discussed in the second model.

TSLS, using observations 1-428

Dependent variable: LNWAGE

Hausman test -

Null hypothesis: OLS estimates are consistent Asymptotic test statistic: Chi-square(1) = 2.72781 with p-value = 0.0986144

Weak instrument test -

First-stage F-statistic (1, 425) = 75.2245

According to Hausman test, p-value is higher than 0.05 therefore, we cannot reject null hypothesis at 5% significance level. Therefore, we don’t have endogeneity at 5% significance level and the first OLS model is more suited for estimation. As of Weak Instrument test, F-statistics is beyond 10%. Hence, we reject the hypothesis that the (external) instrument mothereduc is not weak instrument.

Justification of using Instrumental Variable method

We might encounter some problems regarding endogeneity while estimating a model. Endogeneity occurs when explanatory variable correlates with error term. Basically it means cov(e,x)

0 and x is endogenous. One of the most common examples is omitted variable. For example:

Let’s consider simple regression: y = C + b1*x1 + b2*x2 + e

So error term “e” covers all other factors than x1 and x2. But what if there was another variable x3 which has an effect on y. Thus, now error term equals as following: e = b3*x3 + v

Now full equation becomes: y = C + b1*x1 + b2*x2 + b3*x3 + v

Danger of this event is that LS estimates might over estimate the effect of x1 and x2 on y. Also if explanatory variable is approximate variable, there might be an endogeneity. It is quite difficult to exactly point out ability of worker but to approximate it.

Let’s take a look at another regression model. In this case, data contains hourly wages, education, experience, father and mother education level, as well as host of other potential explanatory variables. This data is from downloaded from associating website of textbook “
Introductory Econometrics: A Modern Approach” by Wooldridge.

Model 2: OLS, using observations 1-935

Dependent variable: lwage

	Coefficient	Std. Error	t-ratio	p-value
const	5.50271	0.112037	49.1151	<0.0001	***
educ	0.077782	0.00657687	11.8266	<0.0001	***
exper	0.0197768	0.00330251	5.9884	<0.0001	***

Mean dependent var	6.779004	S.D. dependent var	0.421144
Sum squared resid	143.9786	S.E. of regression	0.393044
R-squared	0.130859	Adjusted R-squared	0.128994
F(2, 932)	70.16174	P-value(F)	4.13e-29
Log-likelihood	−452.0704	Akaike criterion	910.1407
Schwarz criterion	924.6624	Hannan-Quinn	915.6779

According to the result, additional year of education can increase wage by 7.7%. Also additional year of experience can raise the wage by 1.9%.

By following same procedures which applied for previous models, simple LS result is shown above. There are likely a number of omitted factors which are correlated with educ. This means that educ is likely endogenous, causing it to be both biased and inconsistent. Hence, it’s necessary to find potential instrumental variables that can be applied. As same as previous models, parent’s education level can be correlated with children’s education as well as can have an effect on dependent variable.

Therefore, both father and mother’s education level has been chosen as instrumental variables for educ. Then TSLS model has been chosen to calculate regression.

Model 3: TSLS, using observations 1-932 (n = 722)

Missing or incomplete observations dropped: 210

Dependent variable: lwage

Instrumented: educ

Instruments: const meduc feduc exper

	Coefficient	Std. Error	z	p-value
const	4.42943	0.315519	14.0386	<0.0001	***
educ	0.142298	0.0191804	7.4189	<0.0001	***
exper	0.0376059	0.00572293	6.5711	<0.0001	***

Mean dependent var	6.799923	S.D. dependent var	0.419385
Sum squared resid	121.2749	S.E. of regression	0.410696
R-squared	0.139428	Adjusted R-squared	0.137035
F(2, 719)	28.33596	P-value(F)	1.43e-12

Hausman test -

          Null hypothesis: OLS estimates are consistent
          Asymptotic test statistic: Chi-square(1) = 15.4483
          with p-value = 8.47936e-005

Sargan over-identification test -

          Null hypothesis: all instruments are valid
          Test statistic: LM = 0.0420364
          with p-value = P(Chi-square(1) > 0.0420364) = 0.83755

Weak instrument test -

First-stage F-statistic (2, 718) = 67.2316

As p-value for Hausman test is very small, we can reject null hypothesis at 5% significance level. Therefore, we have endogeneity at 5% significance level and the first OLS model is not suitable for estimation and TSLS model is more suited for estimation. As of Weak Instrument test, F-statistics is beyond 10%. Hence, we reject the hypothesis that the (external) instruments father and mother’s education are strong instruments. Moreover, Sargan test shows us that all instruments are valid.

In this case, it was better to use IV methods to estimate the model.

DULGUUN.G