We want to create a hyperplane to separate classes.
For classification problem (x, y)
Hyperplane is wx-b=0
If data is linearly separable then a hyperplane exists such that all data can be correctly classified
There are an infinit number that could work.
We select two parallel with distance between as large as possible. the region between these two is the margin
The maximum margin hyperplane is the one between the two margin planes
We can rescale the two hyperplanes to:
wx-b=1
wx-b=-1
The distance between the two parrallel hyperplanes is \(\dfrac{2}{||w||}\)
So we minimise \(||w||\) conditional an all points being correctly classified
\(y_i(wx_i-b)\ge 1\)
We select w and b to solve this.
Support vectors are those that make up the classifer boundry.
Soft margin
Data may not be linearly separable, so we introduce a hinge loss function
\(Max(0, 1-y_i(wx-b))\)
We then minimise
\(\lambda ||w||^2+ [\dfrac{1}{n}\sum_{i=1}^n \max (0, 1-y_i(wx_i-b)]\)
This introduces \(\lambda\) as a parameter.
We can use kernels as an alternative to the dot product.