Classification trees

Simple decision trees

Tree traversal

Training decision trees with information gain

We can train a decision tree by starting out with the most simple tree - all outcomes in same node.

We can then do a greedy search to identify which split on the node is best.

We can then iterate this process on future nodes.

Training with information gain

We split nodes to increase maximum entropy.

Entropy is:

\(E= -\sum_i^n p_{i=1}\log_2 p_i\)

Where we are summing across all nodes.

Information gain

The gain in entropy is the original entropy - weighted by size entropy of each branch

Information gain ratio

Training decision trees with Gini impurity

Pruning decision trees

Training a decision tree until there is only one entry from the training set will result in overfitting.

We can use pruning to regularise trees.


Reduced error pruning

From bottom, replace each node with a leaf of the most popular class. Accept if no reduction in accuracy.

Cost complexity pruning

Take full tree \(T_0\)

Iteratively find a subtree to replace with a leaf. Cost function is accuracy and number of leaves.

Remove this generating \(T_{i+1}\)

When we have just the root, choose a single tree using CV.

Growing and pruning

Generally we would split the data up. Grow the tree with one set and then prune with the other.

We can split our data up and iterate between growing and pruning.

When pruning, for each pair of leaves we test to see if they should be merged.

If our two sets are \(A\) and \(B\) we can do:

  • \(A\): Grow

  • \(B\): Prune

  • \(B\): Grow

  • \(A\): Prune

And repeat this process.

Partial regression trees

Once we have built a tree, we keep a single leaf and disard the rest.

Decision trees with many classes

Training decision trees with many classes


Training with unbalanced data

Unbalanced dataset: more of one class than others.

Can reduce sample of majorit, or synthetically generate minority.