Derive hinge loss from SVM

What is hinge loss

The hinge loss is a loss function used to train the machine learning classifier, which is

L(\hat{y}) = max(0, 1 - y\hat{y})      (1)

where y =  -1 or 1  indicating two classes and  \hat{y} represents the output from our classifier.

However, the SVM I know is like

 min\frac{1}{2}\parallel W \parallel^{2}_{} + C\sum^{N}_{i = 1}\xi^{}_{i}      (2)

s.t.    \xi^{}_{i} \geqslant 0, y^{}_{i}(x^{T}_{i}W + b) \geqslant 1-\xi^{}_{i} \forall i

So what is the relation between the two? Are they just two perspectives to look at the same model?

Continue reading “Derive hinge loss from SVM”