

Below we enumerate the loss functions implemented by ERM, and provide their mathematical definition. Some loss functions (e.g., HuberLoss) accept parameters.

Mathematical definitions

nameERM Lossmathematical definition (assuming scalar targets)notes
squaredSquareLoss()$l^{\mathrm{sqr}}(\widehat{y}, y) = (\widehat{y} - y)^2$n/a
absoluteAbsoluteLoss()$l^{\mathrm{abs}}(\widehat y, y) = |\widehat y - y|$n/a
tiltedTiltedLoss()$l^{\mathrm{tlt}}(\widehat y, y) = \tau(\widehat y - y)_+ + (1 - \tau)(\widehat y - y)_{-}$$0 < \tau < 1$
deadzoneDeadzoneLoss()$l^{\mathrm{dz}}(\widehat y, y) = \max(|\widehat y - y| - \alpha, 0)$$\alpha \geq 0$
HuberHuberLoss()$l^{\mathrm{hub}}(\widehat y, y) = \begin{cases} (\widehat{y} - y)^2 & |\widehat{y} - y| \leq \alpha \\\\ \alpha(2|\widehat{y}| - \alpha) & |\widehat{y} - y| > \alpha \end{cases}$$\alpha \geq 0$
log HuberLogHuberLoss()$l^{\mathrm{dh}}(\widehat y, y) = \begin{cases} (\widehat{y} - y)^2 & |\widehat{y} - y| \leq \alpha \\\\ \alpha^2(1 + 2(\log(\widehat{y} - y) - \log(\alpha))) & |\widehat{y} - y| > \alpha \end{cases}$$\alpha \geq 0$
hingeHingeLoss()$l^{\mathrm{hng}}(\widehat y, y) = \max(1 - \widehat{y} y, 0)$n/a
logisticLogisticLoss()$l^{\mathrm{lgt}}(\widehat y, y) = \log(1 + \exp(-\widehat y y)$n/a
sigmoidSigmoidLoss()$l^{\mathrm{sigm}}(\widehat y, y) = 1/(1 + \exp(\widehat y y))$n/a

A good reference for loss functions are the EE104 lecture slides. In particular, the lecture on non-quadratic losses is helpful.

Passing parameters

Some of the loss functions above accept parameters. To pass a parameter, simply provide it as the only argument to the Loss constructor. For example, to provide $\alpha$ for $l^{\mathrm{hub}}$, simply instantiate the loss with HuberLoss(alpha) where alpha >= 0.