Fitting Linear Models with Endogenous Regressors using Lewbel's Higher Moments Approach

Fits linear models with one endogenous regressor using internal instruments built using the approach described in Lewbel A. (1997). This is a statistical technique to address the endogeneity problem where no external instrumental variables are needed. The implementation allows the incorporation of external instruments if available. An important assumption for identification is that the endogenous variable has a skewed distribution.

higherMomentsIV(formula, data, verbose = TRUE)

Arguments

formula: A symbolic description of the model to be fitted. See the "Details" section for the exact notation.
data: A data.frame containing the data of all parts specified in the formula parameter.
verbose: Show details about the running of the function.

Value

Returns an object of classes rendo.ivreg and ivreg, It extends the object returned from function ivreg of package AER and slightly modifies it by adapting the call and formula components. The summary function prints additional diagnostic information as described in documentation for summary.ivreg.

All generic accessor functions for ivreg such as anova, hatvalues, or vcov are available.

Details

Method

Consider the model:

Y_t=β₀ + β₁X_t+αP_t+ε_t

P_t=Z_t+ν_t

The observed data consist of Y_t, X_t and P_t, while Z_t, ε_t, and ν_t are unobserved. The endogeneity problem arises from the correlation of P_t with the structural error ε_t, since E(εν)≠0. The requirement for the structural and measurement error is to have mean zero, but no restriction is imposed on their distribution.

Let S̅ be the sample mean of a variable S_t and G_t=G(X_t) for any given function \(G\) that has finite third own and cross moments. Lewbel(1997) proves that the following instruments can be constructed and used with two-stage least squares to obtain consistent estimates:

q_1t=(G_t-G̅)(3a)

q_2t=(G_t-G̅)(P_t-P̅)(3b)

q_3t=(G_t-G̅)(Y_t-Y̅)(3c)

q_4t=(Y_t-Y̅)(P_t-P̅)(3d)

q_5t=(P_t-P̅)²(3e)

q_6t=(Y_t-Y̅)²(3f)

Instruments in equations \(3e\) and \(3f\) can be used only when the measurement and the structural errors are symmetrically distributed. Otherwise, the use of the instruments does not require any distributional assumptions for the errors. Given that the regressors \(G(X) = X\) are included as instruments, \(G(X)\) should not be linear in \(X\) in equation \(3a\).

Let small letter denote deviation from the sample mean: s_i = S_i-S̅. Then, using as instruments the variables presented in equations \(3\) together with \(1\) and X_t, the two-stage-least-squares estimation will provide consistent estimates for the parameters in equation \(1\) under the assumptions exposed in Lewbel(1997).

Formula parameter

The formula argument follows a four part notation:

A two-sided formula describing the model (e.g. y ~ X1 + X2 + P), a single endogenous regressor (e.g. P), and the exogenous variables from which the internal instrumental variables should be build (e.g. IIV(iiv=y2)), each part separated by a single vertical bar (|).

The instrumental variables that should be built are specified as (multiple) functions, one for each instrument. This function is IIV and uses the following arguments:

iiv: Which internal instrument to build. One of g, gp, gy, yp, p2, y2 can be chosen.
g: Which function g represents in iiv. One of x2, x3, lnx, 1/x can be chosen. Only required if the type of internal instrument demands it.
...: The exogenous regressors to build the internal instrument. If more than one is given, separate instruments are built for each. Only required if the type of internal instrument demands it.

Note that no argument to IIV is to be supplied as character but as symbols without quotation marks.

Optionally, additional external instrumental variables to also include in the instrumental variable regression can be specified. These external instruments have to be already present in the data and are provided as the fourth right-hand side part of the formula, again separated by a vertical bar.

See the example section for illustrations on how to specify the formula parameter.

References

Lewbel A (1997). “Constructing Instruments for Regressions with Measurement Error When No Additional Data are Available, With an Application to Patents and R&D.” Econometrica, 65(5), 1201–1213.

Examples

data("dataHigherMoments")
# P is the endogenous regressor in all examples

# 2 IVs with g*p, g=x^2, separately for each regressor X1 and X2.
hm <- higherMomentsIV(y~X1+X2+P|P|IIV(iiv=gp, g=x2, X1, X2),
                      data = dataHigherMoments)
#> The following internal instruments were built: IIV(iiv=gp,g=x2,X1,X2).
#> Fitting an instrumental variable regression with model y ~ X1 + X2 + P|X1 + X2 + IIV(iiv=gp,g=x2,X1,X2).
# same as above
hm <- higherMomentsIV(y~X1+X2+P|P|IIV(iiv=gp, g=x2, X1) +
                                  IIV(iiv=gp, g=x2, X2),
                      data = dataHigherMoments)
#> The following internal instruments were built: IIV(iiv=gp,g=x2,X1), IIV(iiv=gp,g=x2,X2).
#> Fitting an instrumental variable regression with model y ~ X1 + X2 + P|X1 + X2 + IIV(iiv=gp,g=x2,X1) + IIV(iiv=gp,g=x2,X2).

# 3 different IVs
hm <- higherMomentsIV(y~X1+X2+P|P|IIV(iiv=y2) + IIV(iiv=yp) +
                                  IIV(iiv=g,g=x3,X1),
                      data = dataHigherMoments)
#> The following internal instruments were built: IIV(iiv=y2), IIV(iiv=yp), IIV(iiv=g,g=x3,X1).
#> Fitting an instrumental variable regression with model y ~ X1 + X2 + P|X1 + X2 + IIV(iiv=y2) + IIV(iiv=yp) + IIV(iiv=g,g=x3,X1).

# use X2 as external IV
hm <- higherMomentsIV(y~X1+P|P|IIV(iiv=y2)+IIV(iiv=g,g=lnx,X1)| X2,
                      data = dataHigherMoments)
#> The following internal instruments were built: IIV(iiv=y2), IIV(iiv=g,g=lnx,X1).
#> Fitting an instrumental variable regression with model y ~ X1 + P|X1 + IIV(iiv=y2) + IIV(iiv=g,g=lnx,X1) + X2.
summary(hm)
#> 
#> Call:
#> higherMomentsIV(formula = y ~ X1 + P | P | IIV(iiv = y2) + IIV(iiv = g, 
#>     g = lnx, X1) | X2, data = dataHigherMoments)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -6.49866 -1.42920 -0.01454  1.45356 10.16266 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  4.07558    0.86687   4.701 2.72e-06 ***
#> X1           4.57633    0.43273  10.576  < 2e-16 ***
#> P           -0.98017    0.08896 -11.018  < 2e-16 ***
#> 
#> Diagnostic tests:
#>                   df1  df2 statistic  p-value    
#> Weak instruments    3 2495      5.90 0.000522 ***
#> Wu-Hausman          1 2496      2.28 0.131143    
#> Sargan              2   NA   1015.68  < 2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 2.174 on 2497 degrees of freedom
#> Multiple R-Squared: 0.8378,	Adjusted R-squared: 0.8377 
#> Wald test:   107 on 2 and 2497 DF,  p-value: < 2.2e-16 
#>