Biased Linear Models

For a linear model with many features involved, training data could easily be over-fitted, which will severely impact the model's performance in the prediction phase. Elastic-net regularization is often adopted to tackle this issue, resulting in biased but more robust modeling of the training data.

Elastic-Net Regularization

Mathematically, Elastic-Net regularization is formulated as follows:

\[\lambda \big(\alpha \|\beta\|_1 + (1-\alpha)\frac{1}{2}\|\beta\|_2^2\big),\]

where \(\beta\) represents the vector of coefficients in the linear model(exclusive of intercept), \(\lambda\) is the penalty weight, and \(\alpha\) is the mixing weight between pure LASSO penalty(i.e. \(\|\cdot\|_1\)) and pure Ridge penalty(i.e. \(\|\cdot\|_2^2\)).

Ridge penalty can help to suppress the magnitude of coefficients in the linear model, but not setting them to zero; in comparison, LASSO penalty not only can suppress the magnitude of coefficients, but also setting some originally small ones to zero when it is applied. Therefore, in the pursuit of sparse models, LASSO penalty is more preferred. However, since LASSO penalty is non-smooth, numerical optimization algorithms for solving LASSO-regularized linear models are quite restricted/limited. Users should pay enough attention to the choice of numerical solvers if LASSO penalty is applied.

In hana-ml, Elastic-Net Regularization is supported by the following models/algorithms: