# Support Vector Machines¶

Sanjiv R. Das

## What is a SVM?¶

The goal of the SVM is to map a set of entities with inputs $X=\{x_1,x_2,\ldots,x_n\}$ of dimension $n$, i.e., $X \in R^n$, into a set of categories $Y=\{y_1,y_2,\ldots,y_m\}$ of dimension $m$, such that the $n$-dimensional $X$-space is divided using hyperplanes, which result in the maximal separation between classes $Y$. A hyperplane is the set of points ${\bf x}$ satisfying the equation

$${\bf w} \cdot {\bf x} = b$$

where $b$ is a scalar constant, and ${\bf w} \in R^n$ is the normal vector to the hyperplane, i.e., the vector at right angles to the plane. The distance between this hyperplane and ${\bf w} \cdot {\bf x} = 0$ is given by $b/||{\bf w}||$, where $||{\bf w}||$ is the norm of vector ${\bf w}$.

$H_3$ is the best separating hyperplane.

## Hyperplane Geometry¶

• Suppose we have two categories of data, i.e., $y = \{y_1, y_2\}$.
• Assume that all points in category $y_1$ lie above a hyperplane ${\bf w} \cdot {\bf x} = b_1$, and all points in category $y_2$ lie below a hyperplane ${\bf w} \cdot {\bf x} = b_2$.
• Then the distance between the two hyperplanes is $\frac{|b_1-b_2|}{||{\bf w}||}$.

## Regularization¶

• Of course, there may be no linear hyperplane that perfectly separates the two groups. Hence, L2 regularization.
$$\min_{b_1,b_2,{\bf w},\{\eta_i\}} \frac{1}{2} ||{\bf w}||^2 + C_1 \sum_{i=1}^n \eta_i + C_2 \sum_{i=1}^n \eta_i$$
• where $C_1,C_2$ are the costs for slippage in groups 1 and 2, respectively.
• Often implementations assume $C_1=C_2$.
• The values $\eta_i$ are positive for observations that are not perfectly separated, i.e., lead to slippage.