This time i
wanted to dig into some econometrics and give a brief and simple explanation of
Instrumental variable method. Also i am writing this post in English so all of
people who need this will be able to understand the concept of Instrumental
variable. So without further due, let’s get to it.
In this
assignment, relationship between woman’s wage (lnwage) as dependent variable and
education (educ), experience (exper) as explanatory variables has been analyzed.
Data has a sample of 428 women and has been downloaded from associated website of “Principles of
economics” by R.C. Hill, W.E. Griffiths.
Table 1.
Firstly,
simple OLS regression model was constructed for further observation. It is
observable from table1 that one additional year of education can increase wage
by 10.9%. Also additional year of experience can raise the wage by 1.5%. But
this is quite unrealistic result as 10.9% increase is hard to come by in
practice.
Therefore,
it is possible that we have a problem of endogeneity as variable educ might be
an endogenous variable. It is quite possible that in this simple OLS model, we
have omitted some variables that has effect on dependent variable as well as
correlated with educ.
Hence, new
Instrumental Variable procedure has been applied as mother’s education
(mothereduc) as instrumental variable. Mother’s education has no direct effect
on daughter’s wage so it’s not explanatory variable of dependent variable.
Moreover, mother’s education should be correlated with daughter’s education
level as educated mothers tend to provide at least same education level for
their children. I assumed mother’s education is not
correlated with error term.
The LS
estimate is as following: lnwage = -0.4 + 0.109*educ + 0.015*exper
2SLS : educ=
c + x1*mothereduc + x2*exper
Main
implication of this method is shown above. After 2nd stage least
estimation, educ must be substituted back to first equation. Then the final
result can be shown.
Table 2.
Now, 2SLS regression shows us that additional year
education year can increase the wage by 5.4% which is more realistic than
previous result. Now in order to check whether instrumental variable should be
included or not, let’s run Hausman test for the exogeneity of variable educ in
the regression discussed in the second model.
TSLS, using
observations 1-428
Dependent
variable: LNWAGE
Hausman test -
Null hypothesis: OLS estimates are consistent Asymptotic test statistic:
Chi-square(1) = 2.72781 with p-value = 0.0986144
Weak instrument test -
First-stage F-statistic (1, 425) = 75.2245
According to
Hausman test, p-value is higher than 0.05 therefore, we cannot reject null
hypothesis at 5% significance level. Therefore, we don’t have endogeneity at 5%
significance level and the first OLS model is more suited for estimation. As of
Weak Instrument test, F-statistics is beyond 10%. Hence, we reject the
hypothesis that the (external) instrument mothereduc is not weak
instrument.
Justification of using Instrumental Variable
method
We might
encounter some problems regarding endogeneity while estimating a model.
Endogeneity occurs when explanatory variable correlates with error term.
Basically it means cov(e,x)0 and x is endogenous. One of the most common examples
is omitted variable. For example:
Let’s
consider simple regression: y = C + b1*x1 + b2*x2 + e
So error
term “e” covers all other factors than x1 and x2. But what if there was another
variable x3 which has an effect on y. Thus, now error term equals as following:
e = b3*x3 + v
Now full
equation becomes: y = C + b1*x1 + b2*x2 + b3*x3 + v
Danger of
this event is that LS estimates might over estimate the effect of x1 and x2 on
y. Also if explanatory variable is approximate variable, there might be an
endogeneity. It is quite difficult to exactly point out ability of worker but to
approximate it.
Let’s take a
look at another regression model. In this case, data contains hourly wages,
education, experience, father and mother education level, as well as host of
other potential explanatory variables. This data is from downloaded from
associating website of textbook “
Introductory Econometrics: A Modern
Approach” by
Wooldridge.
Model 2:
OLS, using observations 1-935
Dependent
variable: lwage
|
Coefficient |
Std.
Error |
t-ratio |
p-value |
|
const |
5.50271 |
0.112037 |
49.1151 |
<0.0001 |
*** |
educ |
0.077782 |
0.00657687 |
11.8266 |
<0.0001 |
*** |
exper |
0.0197768 |
0.00330251 |
5.9884 |
<0.0001 |
*** |
Mean dependent
var |
6.779004 |
|
S.D. dependent
var |
0.421144 |
Sum squared
resid |
143.9786 |
|
S.E. of
regression |
0.393044 |
R-squared |
0.130859 |
|
Adjusted
R-squared |
0.128994 |
F(2, 932) |
70.16174 |
|
P-value(F) |
4.13e-29 |
Log-likelihood |
−452.0704 |
|
Akaike
criterion |
910.1407 |
Schwarz
criterion |
924.6624 |
|
Hannan-Quinn |
915.6779 |
According to
the result, additional year of education can increase wage by 7.7%. Also
additional year of experience can raise the wage by 1.9%.
By following
same procedures which applied for previous models, simple LS result is shown
above. There are likely a number of omitted factors which are correlated with
educ. This means that educ is likely endogenous, causing it to be both biased
and inconsistent. Hence, it’s necessary to find potential instrumental variables
that can be applied. As same as previous models, parent’s education level can be
correlated with children’s education as well as can have an effect on dependent
variable.
Therefore,
both father and mother’s education level has been chosen as instrumental
variables for educ. Then TSLS model has been chosen to calculate
regression.
Model 3:
TSLS, using observations 1-932 (n = 722)
Missing or
incomplete observations dropped: 210
Dependent
variable: lwage
Instrumented: educ
Instruments:
const meduc feduc exper
|
Coefficient |
Std.
Error |
z |
p-value |
|
const |
4.42943 |
0.315519 |
14.0386 |
<0.0001 |
*** |
educ |
0.142298 |
0.0191804 |
7.4189 |
<0.0001 |
*** |
exper |
0.0376059 |
0.00572293 |
6.5711 |
<0.0001 |
*** |
Mean dependent
var |
6.799923 |
|
S.D. dependent
var |
0.419385 |
Sum squared
resid |
121.2749 |
|
S.E. of
regression |
0.410696 |
R-squared |
0.139428 |
|
Adjusted
R-squared |
0.137035 |
F(2, 719) |
28.33596 |
|
P-value(F) |
1.43e-12 |
Hausman test -
Null hypothesis: OLS estimates are consistent
Asymptotic test statistic:
Chi-square(1) = 15.4483
with p-value = 8.47936e-005
Sargan over-identification test
-
Null hypothesis: all instruments are valid
Test statistic: LM =
0.0420364
with p-value = P(Chi-square(1) > 0.0420364) =
0.83755
Weak instrument test -
First-stage F-statistic (2, 718) = 67.2316
As p-value
for Hausman test is very small, we can reject null hypothesis at 5% significance
level. Therefore, we have endogeneity at 5% significance level and the first OLS
model is not suitable for estimation and TSLS model is more suited for
estimation. As of Weak Instrument test, F-statistics is beyond 10%. Hence, we
reject the hypothesis that the (external) instruments father and mother’s
education are strong instruments. Moreover, Sargan test shows us that all
instruments are valid.
In this
case, it was better to use IV methods to estimate the model.
DULGUUN.G