OnlineLinearRegression
- class hana_ml.algorithms.pal.linear_model.OnlineLinearRegression(enet_lambda=None, enet_alpha=None, max_iter=None, tol=None)
Online linear regression (Stateless) is an online version of the linear regression and is used when the training data are obtained multiple rounds. Additional data are obtained in each round of training. By making use of the current computed linear model and combining with the obtained data in each round, online linear regression adapts the linear model to make the prediction as precise as possible.
Note
We currently support Online Linear Regression(stateless) in SAP HANA Cloud. Online Linear Regression(stateful) version available in SAP HANA SPS05/06 has not been supported in hana-ml yet.
- Parameters:
- enet_lambdafloat, optional
Penalized weight. Value should be greater than or equal to 0.
Defaults to 0.
- enet_alphafloat, optional
Elastic net mixing parameter. Ranges from 0 (Ridge penalty) to 1 (LASSO penalty) inclusively.
Defaults to 0.
- max_iterint, optional
Maximum iterative cycle. Defaults to 1000.
- tolfloat, optional
Convergence threshold. Defaults to 1.0e-5.
Examples
>>> onlinelr = OnlineLinearRegression(enet_lambda=0.1, enet_alpha=0.5, max_iter=1200, tol=1E-6)
In each run, you could invoke partial_fit() to train the model with a new DataFrame. The use of df_1 as an example, is shown below.
>>> onlinelr.partial_fit(data=df_1, key='ID', label='Y', features=['X1', 'X2']) >>> onlinelr.coefficients_.collect() >>> onlinelr.intermediate_result_.collect()
Perform predict():
>>> onlinelr.predict(data=df_predict, key='ID', features=['X1', 'X2']).collect()
Perform score():
>>> onlinelr.score(data=df_score, key='ID', label='Y', features=['X1', 'X2'])
- Attributes:
- intermediate_result_DataFrame
Intermediate model.
- coefficients_DataFrame
Fitted regression coefficients.
Methods
partial_fit
(data[, key, features, label, ...])Online training based on each round of training data.
predict
(data[, key, features])Predict dependent variable values based on a fitted model.
score
(data[, key, features, label])Returns the coefficient of determination R2 of the prediction.
- partial_fit(data, key=None, features=None, label=None, thread_ratio=None)
Online training based on each round of training data.
- Parameters:
- dataDataFrame
Training data.
- keystr, optional
Name of the ID column.
If
key
is not provided, then:if
data
is indexed by a single column, thenkey
defaults to that index column;otherwise, it is assumed that
data
contains no ID column.
- featuresa list of str, optional
Names of the feature columns.
If
features
is not provided, it defaults to all non-ID, non-label columns.- labelstr, optional
Name of the dependent variable.
If
label
is not provided, it defaults to the last column.- thread_ratiofloat, optional
Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside this range tell PAL to heuristically determine the number of threads to use.
Defaults to 0.0.
- Returns:
- A fitted object of class "OnlineLinearRegression".
- predict(data, key=None, features=None)
Predict dependent variable values based on a fitted model.
- Parameters:
- dataDataFrame
Independent variable values to predict for.
- keystr, optional
Name of the ID column.
Mandatory if
data
is not indexed, or the index ofdata
contains multiple columns.Defaults to the single index column of
data
if not provided.- featuresa list of str, optional
Names of the feature columns.
If
features
is not provided, it defaults to all non-ID columns.
- Returns:
- DataFrame
Predicted values, structured as follows:
ID column: with same name and type as
data
's ID column.VALUE: type DOUBLE, representing predicted values.
- score(data, key=None, features=None, label=None)
Returns the coefficient of determination R2 of the prediction.
- Parameters:
- dataDataFrame
Data on which to assess model performance.
- keystr, optional
Name of the ID column.
Mandatory if
data
is not indexed, or the index ofdata
contains multiple columns.Defaults to the single index column of
data
if not provided.- featuresa list of str, optional
Names of the feature columns.
If
features
is not provided, it defaults all non-ID, non-label columns.- labelstr, optional
Name of the dependent variable.
If
label
is not provided, it defaults to the last column.
- Returns:
- float
Returns the coefficient of determination R2 of the prediction.
Inherited Methods from PALBase
Besides those methods mentioned above, the OnlineLinearRegression class also inherits methods from PALBase class, please refer to PAL Base for more details.