R/f_heterrorsIV.R
hetErrorsIV.Rd
This function estimates the model parameters and associated standard errors for a linear regression model with one endogenous regressor. Identification is achieved through heteroscedastic covariance restrictions within the triangular system as proposed in Lewbel(2012).
hetErrorsIV(formula, data, verbose = TRUE)
Returns an object of classes rendo.ivreg
and ivreg
, It extends the object returned from
function ivreg
of package AER
and slightly modifies it by adapting the call
and formula
components. The summary
function prints additional diagnostic information as
described in documentation for summary.ivreg
.
All generic accessor functions for ivreg
such as anova
, hatvalues
, or vcov
are available.
The method proposed in Lewbel(2012) identifies structural parameters in regression models with endogenous regressors by means of variables that are uncorrelated with the product of heteroskedastic errors. The instruments are constructed as simple functions of the model's data. The method can be applied when no external instruments are available or to supplement external instruments to improve the efficiency of the IV estimator. Consider the model in the equation:
where \(t=1,..,T\) indexes either time or cross-sectional units.The endogeneity problem arises from the correlation of Pt and εt. As such: Pt = Zt+νt, where Zt is a subset of variables in Xt.
The errors, ε and ν, may be correlated with each other. Structural parameters are identified by an ordinary two-stage least squares regression of \(Y\) on \(X\) and \(P\), using \(X\) and \([Z-E(Z)]\nu\) as instruments. A vital assumption for identification is that cov(Z,ν2)≠0. The strength of the instrument is proportional to the covariance of (Z-Z̅)ν with \(\nu\), which corresponds to the degree of heteroskedasticity of \(\nu\) with respect to \(Z\) (Lewbel 2012).
The assumption that the covariance between \(Z\) and the squared error is different from zero can be empirically tested (this is checked in the background when calling the function). If it is zero or close to zero, the instrument is weak, producing imprecise estimates, with large standard errors.
The formula
argument follows a four part notation:
A two-sided formula describing the model (e.g. y ~ X1 + X2 + P
), a single endogenous regressor
(e.g. P
), and the exogenous variables from which the internal instrumental variables should
be build (e.g. IIV(X1) + IIV(X2)
), each part separated by a single vertical bar (|
).
The instrumental variables that should be built are specified as (multiple) functions, one for each
instrument. This function is IIV
and uses the following arguments:
...
The exogenous regressors to build the internal instruments from. If more than one is given, separate instruments are built for each.
Note that no argument to IIV
is to be supplied as character but as symbols without quotation marks.
Optionally, additional external instrumental variables to also include in the instrumental variable regression can be specified. These external instruments have to be already present in the data and are provided as the fourth right-hand side part of the formula, again separated by a vertical bar.
See the example section for illustrations on how to specify the formula
parameter.
Lewbel, A. (2012). Using Heteroskedasticity to Identify and Estimate Mismeasured and Endogenous Regressor Models, Journal of Business & Economic Statistics, 30(1), 67-80.
Angrist, J. and Pischke, J.S. (2009). Mostly Harmless Econometrics: An Empiricists Companion, Princeton University Press.
data("dataHetIV")
# P is the endogenous regressor in all examples
# X1 generates a weak instrument but for the examples
# this is ignored
# 2 IVs, one from X1, one from X2
het <- hetErrorsIV(y~X1+X2+P|P|IIV(X1)+IIV(X2), data=dataHetIV)
#> Residuals were derived by fitting P ~ X1 + X2.
#> Warning: A studentized Breusch-Pagan test (P ~ X1) indicates at a 95% confidence level that the assumption of heteroscedasticity for the variable is not satisfied (p-value: 0.1853). The instrument built from it therefore is weak.
#> The following internal instruments were built: IIV(X1), IIV(X2).
#> Fitting an instrumental variable regression with model y ~ X1 + X2 + P|X1 + X2 + IIV(X1) + IIV(X2).
# same as above
het <- hetErrorsIV(y~X1+X2+P|P|IIV(X1,X2), data=dataHetIV)
#> Residuals were derived by fitting P ~ X1 + X2.
#> Warning: A studentized Breusch-Pagan test (P ~ X1) indicates at a 95% confidence level that the assumption of heteroscedasticity for the variable is not satisfied (p-value: 0.1853). The instrument built from it therefore is weak.
#> The following internal instruments were built: IIV(X1), IIV(X2).
#> Fitting an instrumental variable regression with model y ~ X1 + X2 + P|X1 + X2 + IIV(X1) + IIV(X2).
# use X2 as an external IV
het <- hetErrorsIV(y~X1+P|P|IIV(X1)|X2, data=dataHetIV)
#> Residuals were derived by fitting P ~ X1.
#> Warning: A studentized Breusch-Pagan test (P ~ X1) indicates at a 95% confidence level that the assumption of heteroscedasticity for the variable is not satisfied (p-value: 0.1853). The instrument built from it therefore is weak.
#> The following internal instruments were built: IIV(X1).
#> Fitting an instrumental variable regression with model y ~ X1 + P|X1 + IIV(X1) + X2.
summary(het)
#>
#> Call:
#> hetErrorsIV(formula = y ~ X1 + P | P | IIV(X1) | X2, data = dataHetIV)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -50.9748 -2.1551 0.0277 2.1476 41.1086
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.29763 0.12808 2.324 0.0202 *
#> X1 1.28243 0.11046 11.610 <2e-16 ***
#> P -0.07812 0.03492 -2.237 0.0254 *
#>
#> Diagnostic tests:
#> df1 df2 statistic p-value
#> Weak instruments 2 2496 1344.945 <2e-16 ***
#> Wu-Hausman 1 2496 294.317 <2e-16 ***
#> Sargan 1 NA 1.203 0.273
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Residual standard error: 5.351 on 2497 degrees of freedom
#> Multiple R-Squared: 0.08592, Adjusted R-squared: 0.08519
#> Wald test: 68.7 on 2 and 2497 DF, p-value: < 2.2e-16
#>