Losses

Losses

Below we enumerate the loss functions implemented by ERM, and provide their mathematical definition. Some loss functions (e.g., HuberLoss) accept parameters.

Mathematical definitions

nameERM Lossmathematical definition (assuming scalar targets)notes
squaredSquareLoss()$l^{\mathrm{sqr}}(\widehat{y}, y) = (\widehat{y} - y)^2$n/a
absoluteAbsoluteLoss()$l^{\mathrm{abs}}(\widehat y, y) = |\widehat y - y|$n/a
tiltedTiltedLoss()$l^{\mathrm{tlt}}(\widehat y, y) = \tau(\widehat y - y)_+ + (1 - \tau)(\widehat y - y)_{-}$$0 < \tau < 1$
deadzoneDeadzoneLoss()$l^{\mathrm{dz}}(\widehat y, y) = \max(|\widehat y - y| - \alpha, 0)$$\alpha \geq 0$
HuberHuberLoss()$l^{\mathrm{hub}}(\widehat y, y) = \begin{cases} (\widehat{y} - y)^2 & |\widehat{y} - y| \leq \alpha \\\\ \alpha(2|\widehat{y}| - \alpha) & |\widehat{y} - y| > \alpha \end{cases}$$\alpha \geq 0$
log HuberLogHuberLoss()$l^{\mathrm{dh}}(\widehat y, y) = \begin{cases} (\widehat{y} - y)^2 & |\widehat{y} - y| \leq \alpha \\\\ \alpha^2(1 + 2(\log(\widehat{y} - y) - \log(\alpha))) & |\widehat{y} - y| > \alpha \end{cases}$$\alpha \geq 0$
hingeHingeLoss()$l^{\mathrm{hng}}(\widehat y, y) = \max(1 - \widehat{y} y, 0)$n/a
logisticLogisticLoss()$l^{\mathrm{lgt}}(\widehat y, y) = \log(1 + \exp(-\widehat y y)$n/a
sigmoidSigmoidLoss()$l^{\mathrm{sigm}}(\widehat y, y) = 1/(1 + \exp(\widehat y y))$n/a

A good reference for loss functions are the EE104 lecture slides. In particular, the lecture on non-quadratic losses is helpful.

Passing parameters

Some of the loss functions above accept parameters. To pass a parameter, simply provide it as the only argument to the Loss constructor. For example, to provide $\alpha$ for $l^{\mathrm{hub}}$, simply instantiate the loss with HuberLoss(alpha) where alpha >= 0.