A system is under-identified if there are not enough estimators for all structural parameters.

A system is exactly identified if there are the same number of estimators as structural parameters.

A system is over-identified if there are more estimators than structural parameters.

In general we have in our structural form:

\(\sum^n_i\beta_{ij}y_i=\sum^m_i\gamma_{ij}x_i+\epsilon_j\)

This is a system with \(n\) endogeneous variables and \(m\) exogeneous variables.

We can write this in matrix form.

\(B\mathbf y =\Gamma \mathbf{x} + \mathbf{\epsilon}\)

We can use this to get:

\(\mathbf{y} =B^{-1}\Gamma \mathbf{x} + B^{-1}\mathbf{ \epsilon}\)

We estimate by placing restrictions on \(\Gamma\).

If our data generating process is:

\(Q=\alpha + \beta P +\epsilon \)

We can estimate \(\alpha \)and \(\beta \) through measuring \(P\) and \(Q\).

If, however the data generating process involves simulataneous equations, we can have:

\(Q=\alpha_1 + \beta_1 P + \epsilon_1 \)

\(Q=\alpha_2 + \beta_2 P + \epsilon_2 \)

We can reduce this:

\(\alpha_1 + \beta_1 P + \epsilon_1 =\alpha_2 + \beta_2 P + \epsilon_2 \)

\((\alpha_1 -\alpha_2 )+ (\beta_1 -\beta_2 )P + (\epsilon_1 -\epsilon_2 )=0\)

\(P =\dfrac{\alpha_2-\alpha_1 }{\beta_1-\beta_2}+\dfrac{\epsilon_2-\epsilon_1 }{\beta_1-\beta_2}\)

We can rewrite this as:

\(P=\pi_1 + \tau_1 \)

Similarly we can reduce for \(Q\):

\(Q =\dfrac{\alpha_2\beta_1-\alpha_1\beta_2 }{\beta_1-\beta_2}+\dfrac{\beta_1\epsilon_2 -\beta_2\epsilon_1}{\beta_1-\beta_2}\)

\(Q= \pi_2 + \tau_2\)

If \(P\) is correlated with \(epsilon_1\) or \(\epsilon_2\) then our estimates for \(\beta_1\) and \(\beta_2\) will be biased.

This also affects \(Q\).

From the reduced forms we can see that \(P\) will be correlated, due to simultaneity.

We can estimate \(\pi_1 \) and \(\pi_2\), but this does not allow us to identify any of the structural parameters.

We have \(2\) estimators, but \(4\) parameters.

This is the identification problem.

If \(x\) is correlated with the error term the OLS estimate will be biased.

We have

\(y_i=x_i \theta + \epsilon_i \)

\(x_i=z_i \rho +\mu_i\)

We do OLS on the second to get \(\hat \rho \).

\(\hat \rho =(Z^TZ)^{-1}Z^TX\)

We use this to get predicted values of \(X\).

\(\hat X=Z\rho =Z(Z^TZ)^{-1}Z^TX = P_ZX\)

We then regress \(y\) on the estimated \(X\):

\(y_i=\hat x_i\theta +\epsilon_i\)

Our prediction is then:

\(\hat {\theta_{2SOLS}} = (\hat {X^T}\hat X)^{-1}\hat {X^T}y)\)

\(\hat {\theta_{2SOLS}} = ((P_ZX)^TP_ZX)^{-1}(P_ZX)^Ty)\)

\(\hat {\theta_{2SOLS}} = (X^TP_ZX)^{-1}X^TP_Zy)\)

If the dimension of \(Z\) is the same as \(X\) this collapses to:

\(\hat {\theta_{2SOLS}} = (Z^TX)^{-1}Z^Ty\)

Previously our structural model was:

\(Q=\alpha_1 + \beta_1 P + \epsilon_1 \)

\(Q=\alpha_2 + \beta_2 P + \epsilon_2 \)

And our reduced form:

\(P =\dfrac{\alpha_2-\alpha_1 }{\beta_1-\beta_2}+\dfrac{\epsilon_2-\epsilon_1 }{\beta_1-\beta_2}\)

\(Q =\dfrac{\alpha_2\beta_1-\alpha_1\beta_2 }{\beta_1-\beta_2}+\dfrac{\beta_1\epsilon_2 -\beta_2\epsilon_1}{\beta_1-\beta_2}\)

Or:

\(P=\pi_1 + \tau_1 \)

\(Q= \pi_2 + \tau_2\)

This time we add another measured variable, \(I\).

\(Q=\alpha_1 + \beta_1 P + \theta_1 I + \epsilon_1 \)

\(Q=\alpha_2 + \beta_2 P + \theta_2 I + \epsilon_2 \)

The reduced form is now:

\(P =\dfrac{\alpha_2 -alpha_1 }{\beta_1-\beta_2}+\dfrac{\theta_2-\theta_1 }{\beta_1-\beta_2}I+\dfrac{\epsilon_2-\epsilon_1}{\beta_1-\beta_2}\)

\(Q =\dfrac{\alpha_2\beta_1-\alpha_1\beta_2 }{\beta_1-\beta_2}+\dfrac{\theta_2\beta_1-\theta_1\beta_2}{\beta_1-\beta_2}I+\dfrac{\beta_1\epsilon_2 -\beta_2\epsilon_1}{\beta_1-\beta_2}\)

Or:

\(P =\pi_{11} +\pi_{12}I + \tau_1 \)

\(Q= \pi_{21} +\pi_{22}I + \tau_2 \)

We can estimate \(\pi_1 \) and \(\pi_2 \) as \(\hat \pi_1\) and \(\hat \pi_2\) respectively.

We can now create estimators \(\hat \pi_{11}\), \(\hat \pi_{12}\), \(\hat \pi_{21}\) and \(\hat \pi_{22}\).

We now have \(4\) estimators and \(6\) parameters, meaning that we still cannot identify the model.

Can we use \(\hat \pi \) to identify any of the structural parameters?

We know that:

\(\pi_{11} =\dfrac{\alpha_2 -\alpha_1 }{\beta_1-\beta_2}\)

\(\pi_{12} =\dfrac{\theta_2-\theta_1}{\beta_1-\beta_2}\)

\(\pi_{21} =\dfrac{\alpha_2\beta_1-\alpha_1\beta_2}{\beta_1-\beta_2}\)

\(\pi_{22} =\dfrac{\theta_2\beta_1-\theta_1\beta_2}{\beta_1-\beta_2} \)

If the exogenous variable only affects one side of the equation, so \(\theta_1=0\), we have:

\(\pi_{11} =\dfrac{\alpha_2 -\alpha_1 }{\beta_1-\beta_2}\)

\(\pi_{12} =\dfrac{\theta_2}{\beta_1-\beta_2}\)

\(\pi_{21} =\dfrac{\alpha_2\beta_1-\alpha_1\beta_2}{\beta_1-\beta_2}\)

\(\pi_{22} =\dfrac{\theta_2\beta_1}{\beta_1-\beta_2} \)

So we can see that:

\(\hat \beta_1 = \dfrac{\hat \pi_{22}}{\hat \pi_{12}}\)

This means we now have:

\(\pi_{11} =\dfrac{\pi_{12}(\alpha_2 -\alpha_1 )}{\pi_{22}-\pi_{12}\beta_2}\)

\(\pi_{12} =\dfrac{\pi_{12}\theta_2}{\pi_{22}-\pi_{12}\beta_2}\)

\(\pi_{21} =\dfrac{\pi_{12}(\alpha_2\beta_1-\alpha_1\beta_2)}{\pi_{22}-\pi_{12}\beta_2}\)

\(\pi_{22} =\dfrac{\pi_{12}\theta_2\beta_1}{\pi_{22}-\pi_{12}\beta_2}\)

We can use this to also identify \(\alpha_1\).

If we have independent variables for each of the two equations, we can fully identify the model.

We will have \(6\) estimators and \(6\) parameters.

We are estimating:

\(Q=\alpha_1 + \beta_1 P + \theta_1 I + \epsilon_1 \)

\(Q=\alpha_2 + \beta_2 P + \theta_2 J + \epsilon_2 \)

\(I\) and \(J\) are essentially instrumental variables for the model.

\(I\) is an instrumental variable for demand shocks, and \(J\) is an instrumental variable for supply shocks.

\(\hat {\theta_{IV}} = (Z^TX)^{-1}Z^Ty\)

2SOLS collpases to IV in some circumstances.

Equal to actual parameter so long as \(\epsilon \) uncorrelated with \(Z\).

In OLS we had:

\(\hat {\theta_{OLS}} = (X^TX)^{-1}X^Ty\)

\(Var [\hat {\theta_{OLS}}]=(X^TX)^{-1}X^T\Omega X(X^TX)^{-1}\)

With IV we have

\(\hat {\theta_{IV}} = (Z^TX)^{-1}Z^Ty\)

\(Var [\hat {\theta_{IV}}]=(Z^TX)^{-1}Z^T\Omega Z(Z^TX)^{-1}\)

We can use weighted least squares for \(\Omega \).