Instrumental Variables


Bias of OLS estimator from ommitted variables

Bias of OLS estimator from measurement error

Parameter estimation for simultaneous equations

Structural and reduced forms

Parameter identification problem with simultaneous equations

Identification terminology

A system is under-identified if there are not enough estimators for all structural parameters.

A system is exactly identified if there are the same number of estimators as structural parameters.

A system is over-identified if there are more estimators than structural parameters.

In general we have in our structural form:


This is a system with \(n\) endogeneous variables and \(m\) exogeneous variables.

We can write this in matrix form.

\(B\mathbf y =\Gamma \mathbf{x} + \mathbf{\epsilon}\)

We can use this to get:

\(\mathbf{y} =B^{-1}\Gamma \mathbf{x} + B^{-1}\mathbf{ \epsilon}\)

We estimate by placing restrictions on \(\Gamma\).

Strucutral models

If our data generating process is:

\(Q=\alpha + \beta P +\epsilon \)

We can estimate \(\alpha \)and \(\beta \) through measuring \(P\) and \(Q\).

If, however the data generating process involves simulataneous equations, we can have:

\(Q=\alpha_1 + \beta_1 P + \epsilon_1 \)

\(Q=\alpha_2 + \beta_2 P + \epsilon_2 \)

Reduced form

We can reduce this:

\(\alpha_1 + \beta_1 P + \epsilon_1 =\alpha_2 + \beta_2 P + \epsilon_2 \)

\((\alpha_1 -\alpha_2 )+ (\beta_1 -\beta_2 )P + (\epsilon_1 -\epsilon_2 )=0\)

\(P =\dfrac{\alpha_2-\alpha_1 }{\beta_1-\beta_2}+\dfrac{\epsilon_2-\epsilon_1 }{\beta_1-\beta_2}\)

We can rewrite this as:

\(P=\pi_1 + \tau_1 \)

Similarly we can reduce for \(Q\):

\(Q =\dfrac{\alpha_2\beta_1-\alpha_1\beta_2 }{\beta_1-\beta_2}+\dfrac{\beta_1\epsilon_2 -\beta_2\epsilon_1}{\beta_1-\beta_2}\)

\(Q= \pi_2 + \tau_2\)

We can’t directly estimate structural models

If \(P\) is correlated with \(epsilon_1\) or \(\epsilon_2\) then our estimates for \(\beta_1\) and \(\beta_2\) will be biased.

This also affects \(Q\).

From the reduced forms we can see that \(P\) will be correlated, due to simultaneity.

The identification problem

We can estimate \(\pi_1 \) and \(\pi_2\), but this does not allow us to identify any of the structural parameters.

We have \(2\) estimators, but \(4\) parameters.

This is the identification problem.

2 Stage OLS

2 Stage OLS (2SOLS) estimator


If \(x\) is correlated with the error term the OLS estimate will be biased.

2 Stage OLS - first stage

We have

\(y_i=x_i \theta + \epsilon_i \)

\(x_i=z_i \rho +\mu_i\)

We do OLS on the second to get \(\hat \rho \).

\(\hat \rho =(Z^TZ)^{-1}Z^TX\)

We use this to get predicted values of \(X\).

\(\hat X=Z\rho =Z(Z^TZ)^{-1}Z^TX = P_ZX\)

2 Stage OLS - second stage

We then regress \(y\) on the estimated \(X\):

\(y_i=\hat x_i\theta +\epsilon_i\)

Our prediction is then:

\(\hat {\theta_{2SOLS}} = (\hat {X^T}\hat X)^{-1}\hat {X^T}y)\)

\(\hat {\theta_{2SOLS}} = ((P_ZX)^TP_ZX)^{-1}(P_ZX)^Ty)\)

\(\hat {\theta_{2SOLS}} = (X^TP_ZX)^{-1}X^TP_Zy)\)

If the dimension of \(Z\) is the same as \(X\) this collapses to:

\(\hat {\theta_{2SOLS}} = (Z^TX)^{-1}Z^Ty\)

Bias of the 2SOLS estimator

Variance of the 2SOLS estimator


Identification through exogeneous variables

Previously our structural model was:

\(Q=\alpha_1 + \beta_1 P + \epsilon_1 \)

\(Q=\alpha_2 + \beta_2 P + \epsilon_2 \)

And our reduced form:

\(P =\dfrac{\alpha_2-\alpha_1 }{\beta_1-\beta_2}+\dfrac{\epsilon_2-\epsilon_1 }{\beta_1-\beta_2}\)

\(Q =\dfrac{\alpha_2\beta_1-\alpha_1\beta_2 }{\beta_1-\beta_2}+\dfrac{\beta_1\epsilon_2 -\beta_2\epsilon_1}{\beta_1-\beta_2}\)


\(P=\pi_1 + \tau_1 \)

\(Q= \pi_2 + \tau_2\)

Adding another variable

This time we add another measured variable, \(I\).

\(Q=\alpha_1 + \beta_1 P + \theta_1 I + \epsilon_1 \)

\(Q=\alpha_2 + \beta_2 P + \theta_2 I + \epsilon_2 \)

The reduced form is now:

\(P =\dfrac{\alpha_2 -alpha_1 }{\beta_1-\beta_2}+\dfrac{\theta_2-\theta_1 }{\beta_1-\beta_2}I+\dfrac{\epsilon_2-\epsilon_1}{\beta_1-\beta_2}\)

\(Q =\dfrac{\alpha_2\beta_1-\alpha_1\beta_2 }{\beta_1-\beta_2}+\dfrac{\theta_2\beta_1-\theta_1\beta_2}{\beta_1-\beta_2}I+\dfrac{\beta_1\epsilon_2 -\beta_2\epsilon_1}{\beta_1-\beta_2}\)


\(P =\pi_{11} +\pi_{12}I + \tau_1 \)

\(Q= \pi_{21} +\pi_{22}I + \tau_2 \)

We can estimate \(\pi_1 \) and \(\pi_2 \) as \(\hat \pi_1\) and \(\hat \pi_2\) respectively.

We can now create estimators \(\hat \pi_{11}\), \(\hat \pi_{12}\), \(\hat \pi_{21}\) and \(\hat \pi_{22}\).

Identification with an exogeneous variable

We now have \(4\) estimators and \(6\) parameters, meaning that we still cannot identify the model.

Partial identification

Can we use \(\hat \pi \) to identify any of the structural parameters?

We know that:

  • \(\pi_{11} =\dfrac{\alpha_2 -\alpha_1 }{\beta_1-\beta_2}\)

  • \(\pi_{12} =\dfrac{\theta_2-\theta_1}{\beta_1-\beta_2}\)

  • \(\pi_{21} =\dfrac{\alpha_2\beta_1-\alpha_1\beta_2}{\beta_1-\beta_2}\)

  • \(\pi_{22} =\dfrac{\theta_2\beta_1-\theta_1\beta_2}{\beta_1-\beta_2} \)

If the exogenous variable only affects one side of the equation, so \(\theta_1=0\), we have:

  • \(\pi_{11} =\dfrac{\alpha_2 -\alpha_1 }{\beta_1-\beta_2}\)

  • \(\pi_{12} =\dfrac{\theta_2}{\beta_1-\beta_2}\)

  • \(\pi_{21} =\dfrac{\alpha_2\beta_1-\alpha_1\beta_2}{\beta_1-\beta_2}\)

  • \(\pi_{22} =\dfrac{\theta_2\beta_1}{\beta_1-\beta_2} \)

So we can see that:

\(\hat \beta_1 = \dfrac{\hat \pi_{22}}{\hat \pi_{12}}\)

This means we now have:

  • \(\pi_{11} =\dfrac{\pi_{12}(\alpha_2 -\alpha_1 )}{\pi_{22}-\pi_{12}\beta_2}\)

  • \(\pi_{12} =\dfrac{\pi_{12}\theta_2}{\pi_{22}-\pi_{12}\beta_2}\)

  • \(\pi_{21} =\dfrac{\pi_{12}(\alpha_2\beta_1-\alpha_1\beta_2)}{\pi_{22}-\pi_{12}\beta_2}\)

  • \(\pi_{22} =\dfrac{\pi_{12}\theta_2\beta_1}{\pi_{22}-\pi_{12}\beta_2}\)

We can use this to also identify \(\alpha_1\).

Complete identification

If we have independent variables for each of the two equations, we can fully identify the model.

We will have \(6\) estimators and \(6\) parameters.

We are estimating:

\(Q=\alpha_1 + \beta_1 P + \theta_1 I + \epsilon_1 \)

\(Q=\alpha_2 + \beta_2 P + \theta_2 J + \epsilon_2 \)

\(I\) and \(J\) are essentially instrumental variables for the model.

\(I\) is an instrumental variable for demand shocks, and \(J\) is an instrumental variable for supply shocks.

Power of instruments

The Instrumental Variable (IV) estimator

Instrumental Variable (IV) estimator

\(\hat {\theta_{IV}} = (Z^TX)^{-1}Z^Ty\)

2SOLS collpases to IV in some circumstances.

Bias of the IV estimator

Equal to actual parameter so long as \(\epsilon \) uncorrelated with \(Z\).

Variance of the IV estimator

In OLS we had:

\(\hat {\theta_{OLS}} = (X^TX)^{-1}X^Ty\)

\(Var [\hat {\theta_{OLS}}]=(X^TX)^{-1}X^T\Omega X(X^TX)^{-1}\)

With IV we have

\(\hat {\theta_{IV}} = (Z^TX)^{-1}Z^Ty\)

\(Var [\hat {\theta_{IV}}]=(Z^TX)^{-1}Z^T\Omega Z(Z^TX)^{-1}\)

We can use weighted least squares for \(\Omega \).

Choosing instrumental variables

Double selection


Natural experiments

Non-linear models in the first stage

Random Effects Instrumental Variables (REIV)

Fixed Effects Instrumental Variables (FEIV)


synthetic IV indep on nuisance as alternative to matching.

IV: h3 on non-linear models for first stage


controlled experiments

two sources: missing data and simultaneous

variations in government rollouts, lotteries

IV may only affect subset of individuals

For example IV of draft number for military service. This only is an instrument for conscripts, not volunteers

generally, need to rationalise this and time series. There’s stuff there on natural experiments etc

define confounding in IV? or in dependent variables? is different issue to the one of correlation with error?

h3 on Limited Information Maximum Likelihood

h3 on K-class estimation

Contrast loss and Siamese h3? One shot classification

IV: frame around parameter estimation when don’t observe some variables. This can mean the direct variable can’t be measured, or that some controls can’t be measured

which factors to include? All?

page on structural and reduced forms

h3 on simultaneous equations there? Eg \(y=c_1+\theta_1 X+\epsilon_1\) \(y=c_2+\theta_2 X+\rho Z\epsilon_2\)

We can turn this into the reduced form: \(y=c_3+\theta_3Z+\epsilon_3\) \(y=c_4+\theta_4Z+\epsilon_4\)

difference between confounding and correlation with error?