Regularising neural networks

Regularising neural networks

Feature normalisation

Dropout, and dropout layers

\(L_2\) regularisation (including how to change backprob algorithm)

Sparse networks

Parameters are set to \(0\) and not trained.

Parameter sharing

Parameters share the same value and are trained together.

Weight decay

After each update, multiply the parameter by \(p<1\).

The anomoly detection problem

Can change input to get any classification.

Early stopping

Residual blocks

In a node we have:

\(a_{ij}=\sigma_{ij}(W_{ij}a_{i-1})\)

That is, the value of a node, is the activation on the sum of the weights of the previous layer.

Residual block however look further back that one layer. They include the full data from an older layer (without weights)

\(a_{ij}=\sigma_{ij}(W_{ij}a_{i-1}+a_k)\)