Linear Support Vector Machines in the Linearly Separable Case
Problem Description
Assume we have a learning set of data, where and . The binary classification problem is to use to construct a function so that
is a classifier.
If is linearly separable, then the optimization problem is given by
Primal Problem Given by Lagrangian Multipliers
By using Lagrangian multipliers, the primal function is given by
where is the Lagrangian coefficients. So the primal problem is equivalent to
The Karush–Kuhn–Tucker conditions give necessary and sufficient conditions for a solution to a constrained optimization problem:
KKT conditions yields
and is implicitly determined by the KKT complementarity condition, by choosing any for which and computing (note that it is numerically safer to take the mean value of resulting from all such equations).
Applying KKT conditions to the primal function simplifies the dual function
where .
Dual Problem
When KKT conditions are satisfied, the primal problem is equivalent to the dual problem
i.e.
If solves this optimization problem, then
By using KKT complementarity condition,
where is the set of supporting vectors.
Then
Since
the maximum margin is given by .