Support Vector Machines (SVMs)

Linear Support Vector Classifiers (SVCs)

Hard-margin SVC

Linear separators

We want to create a hyperplane to separate classes.

For classification problem (x, y)

Hyperplane is wx-b=0

Hard margin

If data is linearly separable then a hyperplane exists such that all data can be correctly classified

There are an infinit number that could work.

We select two parallel with distance between as large as possible. the region between these two is the margin

The maximum margin hyperplane is the one between the two margin planes

We can rescale the two hyperplanes to:

wx-b=1

wx-b=-1

The distance between the two parrallel hyperplanes is \(\dfrac{2}{||w||}\)

So we minimise \(||w||\) conditional an all points being correctly classified

\(y_i(wx_i-b)\ge 1\)

We select w and b to solve this.

Support vectors

Support vectors are those that make up the classifer boundry.

Estimating the SVC using quadratic equations

Soft-margin SVC

Soft margin

Data may not be linearly separable, so we introduce a hinge loss function

\(Max(0, 1-y_i(wx-b))\)

We then minimise

\(\lambda ||w||^2+ [\dfrac{1}{n}\sum_{i=1}^n \max (0, 1-y_i(wx_i-b)]\)

This introduces \(\lambda\) as a parameter.

Regularising the SVC

Multiple classes

Support vector classifiers for multiple classes

Non-linear support vector classifiers

The dot product SVC

The kernel trick

We can use kernels as an alternative to the dot product.