General Linear Models

Cross-sectional regression

The cross-sectional model

Hierarchical data

Our standard linear model is:

\(y_i=\alpha + X_i\theta +\epsilon_i\)

If we had two sets of data we could view these as:

\(y_{i,0}=\alpha_0 + X_{i,0}\theta_0 +\epsilon_{i,0}\)

\(y_{i,1}=\alpha_1 + X_{i,1}\theta_1 +\epsilon_{i,1}\)

Here, the data data from \(1\) does not affect the parameters in \(2\).

Pooled data

If we think the data generating process is similar between models, then by restricting the freedom of parameters between models we can get more data for each estimate.

For example if we think that all parameters are the same between the models we can estimate:

\(y_{i,0}=\alpha + X_{i,0}\theta +\epsilon_{i,0}\)

\(y_{i,1}=\alpha + X_{i,1}\theta +\epsilon_{i,1}\)

Or:

\(y_{ij}=\alpha + X_{ij}\theta + \epsilon_{ij}\)

Fixed slopes

Intercepts may be different between the groups. In this case we can instead use the model:

\(y_{ij}=\alpha + X_{ij}\theta + \xi_j + \epsilon_{ij}\)

There are different ways of estimating this model:

  • Pooled OLS

  • Fixed effects

  • Random effects

Unbalanced data

The pooled OLS estimator

Pooled OLS

Introduction

Our model is:

\(y_{ij}=\alpha + X_{ij}\theta + \xi_j + \epsilon_{ij}\)

The pooled OLS estimator

The fixed effects estimator

Within and between transformation

Introduction

We can group the data in two ways, one gets between differences and the other within differences.

In the above example, we could find the effects of schools, or of departments.

\(y_{ij}=\alpha + X_{ij}\theta +\epsilon_{ij}\)

\((y_{ij}-\bar y_{j})=(\alpha -\bar \alpha )+(X_{ij}-\bar X_{j})\theta +(\epsilon_{ij}-\bar \epsilon_{j})\)

\((y_{ij}-\bar y_{j})=(X_{ij}-\bar X_{j})\theta +(\epsilon_{ij}-\bar \epsilon_{j})\)

Or alternatively:

\((y_{ij}-\bar y_{i})=(X_{ij}-\bar X_{i})\theta +(\epsilon_{ij}-\bar \epsilon_{i})\)

Regardless of the form we choose, we can write this as:

\(\ddot y_{ij}=\ddot X_{ij}\theta +\ddot \epsilon_{ij}\)

The fixed effects estimator

Recap on the model

Our model is:

\(y_{ij}=\alpha + X_{ij}\theta + \xi_j + \epsilon_{ij}\)

The fixed effects estimator

With fixed effects we assume that \(U_{ij}\) is a constant for each group. That is:

\(U_{ij}=\delta_{ij}U_j\)

\(y_{ij}=\alpha + X_{ij}\theta +\epsilon_{ij}+\delta_{ij}U_{j}\)

We can use this in a regression if the standard assumptions of OLS are met. In particular, that group membership is uncorrelated with the error term.

We add these dummies to \(X_{ij}\) and regress:

\(y_{ij}=\alpha + X_{ij}\theta +\epsilon_{ij}\)

The parameter for the dummy is the fixed effect of group membership.

As we are including membership in the dependent variables, there is no problem if group membership correlates with other independent variables.

Using the within and between transformations

\((y_{ij}-\bar y_{i})=(X_{ij}-\bar X_{i})\theta +(U_{ij} -\bar U_{i}) +(\epsilon_{ij}-\bar \epsilon_{i})\)

Or:

\(\ddot y_{ij}=\ddot X_{ij}\theta +\ddot \epsilon_{ij}\)

This this get the same outcome, but is a different computational process.

The random effects estimator

The random effects estimator

Introduction

Our model is:

\(y_{ij}=\alpha + X_{ij}\theta + \xi_j + \epsilon_{ij}\)

FGLS recap

The random effects estimator

For fixed effects, we had the requirement that group membership be uncorrelated with the error term, but that it could be correlated with other independent variables.

For random effects models, group membership cannot be correlated with other variables.

We have:

\(y_{ij}=\alpha + X_{ij}\theta +\epsilon_{ij}+U_{ij}\)

We now model \(U_{ij}=\bar U_{j}+\rho_j\).

\(y_{ij}=\alpha + X_{ij}\theta +\epsilon_{ij}+\bar U_{j}+\rho_j\)

This randomness of the effect implies, for example, that if we ran the survey again we would expect a different effect

Clustering standard error

Estimation

We use GLS.

Choosing the model form

The Hausman specification test

Introduction

The Hausman specification test allows you to choose between a fixed effects model and a random effects model.

Efficiency

Random effects models are more efficient.

The mixed effects estimator

The mixed effects estimator

Introduction

Manipulating data

Disaggregation

Used in polls

Multilevel Regression with Poststratification (Mr P)