OnlineLinearRegression

class hana_ml.algorithms.pal.linear_model.OnlineLinearRegression(enet_lambda=None, enet_alpha=None, max_iter=None, tol=None)

Online linear regression (Stateless) is an online version of the linear regression and is used when the training data are obtained multiple rounds. Additional data are obtained in each round of training. By making use of the current computed linear model and combining with the obtained data in each round, online linear regression adapts the linear model to make the prediction as precise as possible.

Note

We currently support Online Linear Regression(stateless) in SAP HANA Cloud. Online Linear Regression(stateful) version available in SAP HANA SPS05/06 has not been supported in hana-ml yet.

Parameters:

enet_lambdafloat, optional

Penalized weight. Value should be greater than or equal to 0.

Defaults to 0.

enet_alphafloat, optional

Elastic net mixing parameter. Ranges from 0 (Ridge penalty) to 1 (LASSO penalty) inclusively.

Defaults to 0.

max_iterint, optional

Maximum iterative cycle. Defaults to 1000.

tolfloat, optional

Convergence threshold. Defaults to 1.0e-5.

Attributes:

intermediate_result_DataFrame: Intermediate model.
coefficients_DataFrame: Fitted regression coefficients.

Methods

`partial_fit`(data[, key, features, label, ...])	Online training based on each round of training data.
`predict`(data[, key, features])	Predict dependent variable values based on a fitted model.
`score`(data[, key, features, label])	Returns the coefficient of determination R2 of the prediction.

Examples

>>> onlinelr = OnlineLinearRegression(enet_lambda=0.1, enet_alpha=0.5, max_iter=1200, tol=1E-6)

In each run, you could invoke partial_fit() to train the model with a new DataFrame. The use of df_1 as an example, is shown below.

>>> onlinelr.partial_fit(data=df_1, key='ID', label='Y', features=['X1', 'X2'])
>>> onlinelr.coefficients_.collect()
>>> onlinelr.intermediate_result_.collect()

Perform predict():

>>> onlinelr.predict(data=df_predict, key='ID', features=['X1', 'X2']).collect()

Perform score():

>>> onlinelr.score(data=df_score, key='ID', label='Y', features=['X1', 'X2'])

partial_fit(data, key=None, features=None, label=None, thread_ratio=None)

Online training based on each round of training data.

Parameters:

dataDataFrame

Training data.

keystr, optional

Name of the ID column.

If key is not provided, then:

if data is indexed by a single column, then key defaults to that index column;
otherwise, it is assumed that data contains no ID column.

featuresa list of str, optional

Names of the feature columns.

If features is not provided, it defaults to all non-ID, non-label columns.

labelstr, optional

Name of the dependent variable.

If label is not provided, it defaults to the last column.

thread_ratiofloat, optional

Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside this range tell PAL to heuristically determine the number of threads to use.

Defaults to 0.0.

Returns:

A fitted object of class "OnlineLinearRegression".

predict(data, key=None, features=None)

Predict dependent variable values based on a fitted model.

Parameters:

dataDataFrame

Independent variable values to predict for.

keystr, optional

Name of the ID column.

Mandatory if data is not indexed, or the index of data contains multiple columns.

Defaults to the single index column of data if not provided.

featuresa list of str, optional

Names of the feature columns.

If features is not provided, it defaults to all non-ID columns.

Returns:

DataFrame

Predicted values, structured as follows:

ID column: with same name and type as data 's ID column.

VALUE: type DOUBLE, representing predicted values.

score(data, key=None, features=None, label=None)

Returns the coefficient of determination R2 of the prediction.

Parameters:

dataDataFrame

Data on which to assess model performance.

keystr, optional

Name of the ID column.

Mandatory if data is not indexed, or the index of data contains multiple columns.

Defaults to the single index column of data if not provided.

featuresa list of str, optional

Names of the feature columns.

If features is not provided, it defaults all non-ID, non-label columns.

labelstr, optional

Name of the dependent variable.

If label is not provided, it defaults to the last column.

Returns:

float: Returns the coefficient of determination R2 of the prediction.

Inherited Methods from PALBase

Besides those methods mentioned above, the OnlineLinearRegression class also inherits methods from PALBase class, please refer to PAL Base for more details.