# Support Vector Machines (SVMs)

## Linear Support Vector Classifiers (SVCs)

### Hard-margin SVC

#### Linear separators

We want to create a hyperplane to separate classes.

For classification problem (x, y)

Hyperplane is wx-b=0

#### Hard margin

If data is linearly separable then a hyperplane exists such that all data can be correctly classified

There are an infinit number that could work.

We select two parallel with distance between as large as possible. the region between these two is the margin

The maximum margin hyperplane is the one between the two margin planes

We can rescale the two hyperplanes to:

wx-b=1

wx-b=-1

The distance between the two parrallel hyperplanes is $$\dfrac{2}{||w||}$$

So we minimise $$||w||$$ conditional an all points being correctly classified

$$y_i(wx_i-b)\ge 1$$

We select w and b to solve this.

### Support vectors

Support vectors are those that make up the classifer boundry.

### Soft-margin SVC

#### Soft margin

Soft margin

Data may not be linearly separable, so we introduce a hinge loss function

$$Max(0, 1-y_i(wx-b))$$

We then minimise

$$\lambda ||w||^2+ [\dfrac{1}{n}\sum_{i=1}^n \max (0, 1-y_i(wx_i-b)]$$

This introduces $$\lambda$$ as a parameter.

## Non-linear support vector classifiers

### The kernel trick

We can use kernels as an alternative to the dot product.