# The Naive Bayes classifier

## Naive Bayes

### The Naive Bayes posterior

#### Bayes theorem

Consider Bayesâ€™ theorem

$$P(y|x_1,x_2,...,x_n)=\dfrac{P(x_1,x_2,...,x_n|y)P(y)}{P(x_1,x_2,...,x_n)}$$

Here, $$y$$ is the label, and $$x_1,x_2,...,x_n$$ is the evidence. We want to know the probability of each label given evidence.

The denominator, $$P(x_1,x_2,...,x_n)$$, is the same for all, so we only need to identify:

$$P(y|x_1,x_2,...,x_n)\propto P(x_1,x_2,...,x_n|y)P(y)$$

#### The assumption of Naive Bayes

We assume each $$x$$ is independent. Therefore:

$$P(x_1,x_2,...,x_n|y)=P(x_1|y)P(x_2|y)...P(x_n|y)$$

$$P(y|x_1,x_2,...,x_n)\propto P(x_1|y)P(x_2|y)...P(x_n|y)P(y)$$

### The Naive Bayes classifier

#### Calculating the Naive Bayes estimator

With the Naive Bayes assumption we have:

$$P(y|x_1,x_2,...,x_n)\propto P(x_1|y)P(x_2|y)...P(x_n|y)P(y)$$

We now choose $$y$$ which maximises this.

This is easier to calculate, as there is less of a sample restriction.

This is used when evidence is also in classes, as the chance of any individual outcome on a continuous probability is $$0$$.

#### Estimating $$P(y)$$

We can easily calculate $$P(y)$$, by looking at the frequency across the sample.

#### Estimating $$P(x_1|y)$$

Normally, $$P(x_1|y)=\dfrac{n_c}{n_y}$$, where:

• $$n_c$$ is the number of instances where the evidence is $$c$$ and the label is $$y$$.

• $$n_y$$ is the number of instances where the label is $$y$$.

#### Regularising the Naive Bayes estimator

To reduce the risk of specific probabilities being zero, we can adjust them, so that:

$$P(x_1|y)=\dfrac{n_c+mp}{n_y+m}$$, where:

• $$p$$ is the prior probability. If this is unknown, use $$\dfrac{1}{k}$$, where $$k$$ is the number of classes.

• $$m$$ is a parameter called the equivilant sample size.

### Text classification using Naive Bayes

#### Naive Bayes and text classification

Naive Bayes can be used to classify text documents. The $$x$$ variables can be appearances of each word, and $$y$$ can be the document classification.