BiVariateGeometricRegression
- class hana_ml.algorithms.pal.regression.BiVariateGeometricRegression(decomposition=None, adjusted_r2=None, pmml_export=None, thread_ratio=0.0)
Geometric regression is an approach used to model the relationship between a scalar variable y and a variable denoted X. In geometric regression, data is modeled using geometric functions, and unknown model parameters are estimated from the data. Such models are called geometric models.
- Parameters:
- decomposition{'LU', 'QR', 'SVD', 'Cholesky'}, optional
Matrix factorization type to use. Case-insensitive.
'LU': LU decomposition.
'QR': QR decomposition.
'SVD': singular value decomposition.
'Cholesky': Cholesky(LDLT) decomposition.
Defaults to QR decomposition.
- adjusted_r2bool, optional
If true, include the adjusted R2 value in the statistics table.
Defaults to False.
- pmml_export{'no', 'single-row', 'multi-row'}, optional
Controls whether to output a PMML representation of the model, and how to format the PMML. Case-insensitive.
'no' or not provided: No PMML model.
'single-row': Exports a PMML model in a maximum of one row. Fails if the model doesn't fit in one row.
'multi-row': Exports a PMML model, splitting it across multiple rows if it doesn't fit in one.
Prediction does not require a PMML model.
- thread_ratiofloat, optional(deprecated)
Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.
Defaults to 0.
Examples
>>> gr = BiVariateGeometricRegression(pmml_export='multi-row') >>> gr.fit(data=df_train, key='ID')
Perform predict():
>>> er.predict(data=df_predict, key='ID').collect()
- Attributes:
- coefficients_DataFrame
Fitted regression coefficients.
- pmml_DataFrame
PMML model. Set to None if no PMML model was requested.
- fitted_DataFrame
Predicted dependent variable values for training data. Set to None if the training data has no row IDs.
- statistics_DataFrame
Regression-related statistics, such as mean squared error.
Methods
fit
(data[, key, features, label])Fit the model to the training dataset.
Get the model metrics.
Get the score metrics.
predict
(data[, key, features, model_format, ...])Predict dependent variable values based on fitted model.
score
(data[, key, features, label])Returns the coefficient of determination R2 of the prediction.
- fit(data, key=None, features=None, label=None)
Fit the model to the training dataset.
- Parameters:
- dataDataFrame
Training data.
- keystr, optional
Name of the ID column.
If
key
is not provided, then:if
data
is indexed by a single column, thenkey
defaults to that index column;otherwise, it is assumed that
data
contains no ID column.
- featuresa list of str, optional
Names of the feature columns.
- labelstr, optional
Name of the dependent variable.
Defaults to the last non-ID column(this is not the PAL default).
- Returns:
- A fitted object of class "BiVariateGeometricRegression".
- predict(data, key=None, features=None, model_format=None, thread_ratio=0.0)
Predict dependent variable values based on fitted model.
- Parameters:
- dataDataFrame
Independent variable values used for prediction.
- keystr, optional
Name of the ID column.
Mandatory if
data
is not indexed, or the index ofdata
contains multiple columns.Defaults to the single index column of
data
if not provided.- featuresa list of str, optional
Names of the feature columns.
- model_formatint or str, optional(deprecated)
0 or 'coefficient' : using coefficient table as model for prediction
1 or 'pmml' : using pmml table as model for prediction
Defaults to 'coefficient'.
Deprecated, not effective any more.
- thread_ratiofloat, optional
Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.
Defaults to 0.
- Returns:
- DataFrame
Predicted values, structured as follows:
ID column, with same name and type as
data
's ID column.VALUE, type DOUBLE, representing predicted values.
Note
predict() will pass the
pmml_
table to PAL as the model representation if there is apmml_
table, or thecoefficients_
table otherwise.
- score(data, key=None, features=None, label=None)
Returns the coefficient of determination R2 of the prediction.
- Parameters:
- dataDataFrame
Data on which to assess model performance.
- keystr, optional
Name of the ID column.
Mandatory if
data
is not indexed, or the index ofdata
contains multiple columns.Defaults to the single index column of
data
if not provided.- featuresa list of str, optional
Names of the feature columns.
- labelstr, optional
Name of the dependent variable.
Defaults to the last non-ID column(this is not the PAL default).
- Returns:
- float
The coefficient of determination R2 of the prediction on the given data.
- get_model_metrics()
Get the model metrics.
- Returns:
- DataFrame
The model metrics.
- get_score_metrics()
Get the score metrics.
- Returns:
- DataFrame
The score metrics.
Inherited Methods from PALBase
Besides those methods mentioned above, the BiVariateGeometricRegression class also inherits methods from PALBase class, please refer to PAL Base for more details.