# General Linear Models

## Cross-sectional regression

### The cross-sectional model

#### Hierarchical data

Our standard linear model is:

$$y_i=\alpha + X_i\theta +\epsilon_i$$

If we had two sets of data we could view these as:

$$y_{i,0}=\alpha_0 + X_{i,0}\theta_0 +\epsilon_{i,0}$$

$$y_{i,1}=\alpha_1 + X_{i,1}\theta_1 +\epsilon_{i,1}$$

Here, the data data from $$1$$ does not affect the parameters in $$2$$.

#### Pooled data

If we think the data generating process is similar between models, then by restricting the freedom of parameters between models we can get more data for each estimate.

For example if we think that all parameters are the same between the models we can estimate:

$$y_{i,0}=\alpha + X_{i,0}\theta +\epsilon_{i,0}$$

$$y_{i,1}=\alpha + X_{i,1}\theta +\epsilon_{i,1}$$

Or:

$$y_{ij}=\alpha + X_{ij}\theta + \epsilon_{ij}$$

#### Fixed slopes

Intercepts may be different between the groups. In this case we can instead use the model:

$$y_{ij}=\alpha + X_{ij}\theta + \xi_j + \epsilon_{ij}$$

There are different ways of estimating this model:

• Pooled OLS

• Fixed effects

• Random effects

## The pooled OLS estimator

### Pooled OLS

#### Introduction

Our model is:

$$y_{ij}=\alpha + X_{ij}\theta + \xi_j + \epsilon_{ij}$$

## The fixed effects estimator

### Within and between transformation

#### Introduction

We can group the data in two ways, one gets between differences and the other within differences.

In the above example, we could find the effects of schools, or of departments.

$$y_{ij}=\alpha + X_{ij}\theta +\epsilon_{ij}$$

$$(y_{ij}-\bar y_{j})=(\alpha -\bar \alpha )+(X_{ij}-\bar X_{j})\theta +(\epsilon_{ij}-\bar \epsilon_{j})$$

$$(y_{ij}-\bar y_{j})=(X_{ij}-\bar X_{j})\theta +(\epsilon_{ij}-\bar \epsilon_{j})$$

Or alternatively:

$$(y_{ij}-\bar y_{i})=(X_{ij}-\bar X_{i})\theta +(\epsilon_{ij}-\bar \epsilon_{i})$$

Regardless of the form we choose, we can write this as:

$$\ddot y_{ij}=\ddot X_{ij}\theta +\ddot \epsilon_{ij}$$

### The fixed effects estimator

#### Recap on the model

Our model is:

$$y_{ij}=\alpha + X_{ij}\theta + \xi_j + \epsilon_{ij}$$

#### The fixed effects estimator

With fixed effects we assume that $$U_{ij}$$ is a constant for each group. That is:

$$U_{ij}=\delta_{ij}U_j$$

$$y_{ij}=\alpha + X_{ij}\theta +\epsilon_{ij}+\delta_{ij}U_{j}$$

We can use this in a regression if the standard assumptions of OLS are met. In particular, that group membership is uncorrelated with the error term.

We add these dummies to $$X_{ij}$$ and regress:

$$y_{ij}=\alpha + X_{ij}\theta +\epsilon_{ij}$$

The parameter for the dummy is the fixed effect of group membership.

As we are including membership in the dependent variables, there is no problem if group membership correlates with other independent variables.

#### Using the within and between transformations

$$(y_{ij}-\bar y_{i})=(X_{ij}-\bar X_{i})\theta +(U_{ij} -\bar U_{i}) +(\epsilon_{ij}-\bar \epsilon_{i})$$

Or:

$$\ddot y_{ij}=\ddot X_{ij}\theta +\ddot \epsilon_{ij}$$

This this get the same outcome, but is a different computational process.

## The random effects estimator

### The random effects estimator

#### Introduction

Our model is:

$$y_{ij}=\alpha + X_{ij}\theta + \xi_j + \epsilon_{ij}$$

#### The random effects estimator

For fixed effects, we had the requirement that group membership be uncorrelated with the error term, but that it could be correlated with other independent variables.

For random effects models, group membership cannot be correlated with other variables.

We have:

$$y_{ij}=\alpha + X_{ij}\theta +\epsilon_{ij}+U_{ij}$$

We now model $$U_{ij}=\bar U_{j}+\rho_j$$.

$$y_{ij}=\alpha + X_{ij}\theta +\epsilon_{ij}+\bar U_{j}+\rho_j$$

This randomness of the effect implies, for example, that if we ran the survey again we would expect a different effect

We use GLS.

## Choosing the model form

### The Hausman specification test

#### Introduction

The Hausman specification test allows you to choose between a fixed effects model and a random effects model.

#### Efficiency

Random effects models are more efficient.

Used in polls