BiVariateGeometricRegression

class hana_ml.algorithms.pal.regression.BiVariateGeometricRegression(decomposition=None, adjusted_r2=None, pmml_export=None, thread_ratio=0.0)

Geometric regression is an approach used to model the relationship between a scalar variable y and a variable denoted X. In geometric regression, data is modeled using geometric functions, and unknown model parameters are estimated from the data. Such models are called geometric models.

Parameters:

decomposition{'LU', 'QR', 'SVD', 'Cholesky'}, optional

Matrix factorization type to use. Case-insensitive.

'LU': LU decomposition.

'QR': QR decomposition.

'SVD': singular value decomposition.

'Cholesky': Cholesky(LDLT) decomposition.

Defaults to QR decomposition.

adjusted_r2bool, optional

If true, include the adjusted R2 value in the statistics table.

Defaults to False.

pmml_export{'no', 'single-row', 'multi-row'}, optional

Controls whether to output a PMML representation of the model, and how to format the PMML. Case-insensitive.

'no' or not provided: No PMML model.

'single-row': Exports a PMML model in a maximum of one row. Fails if the model doesn't fit in one row.

'multi-row': Exports a PMML model, splitting it across multiple rows if it doesn't fit in one.

Prediction does not require a PMML model.

thread_ratiofloat, optional(deprecated)

Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.

Defaults to 0.

Examples

>>> gr = BiVariateGeometricRegression(pmml_export='multi-row')
>>> gr.fit(data=df_train, key='ID')

Perform predict():

>>> er.predict(data=df_predict, key='ID').collect()

Attributes:

coefficients_DataFrame: Fitted regression coefficients.
pmml_DataFrame: PMML model. Set to None if no PMML model was requested.
fitted_DataFrame: Predicted dependent variable values for training data. Set to None if the training data has no row IDs.
statistics_DataFrame: Regression-related statistics, such as mean squared error.

Methods

`fit`(data[, key, features, label])	Fit the model to the training dataset.
`get_model_metrics`()	Get the model metrics.
`get_score_metrics`()	Get the score metrics.
`predict`(data[, key, features, model_format, ...])	Predict dependent variable values based on fitted model.
`score`(data[, key, features, label])	Returns the coefficient of determination R2 of the prediction.

fit(data, key=None, features=None, label=None)

Fit the model to the training dataset.

Parameters:

dataDataFrame

Training data.

keystr, optional

Name of the ID column.

If key is not provided, then:

if data is indexed by a single column, then key defaults to that index column;

otherwise, it is assumed that data contains no ID column.

featuresa list of str, optional

Names of the feature columns.

labelstr, optional

Name of the dependent variable.

Defaults to the last non-ID column(this is not the PAL default).

Returns:

A fitted object of class "BiVariateGeometricRegression".

predict(data, key=None, features=None, model_format=None, thread_ratio=0.0)

Predict dependent variable values based on fitted model.

Parameters:

dataDataFrame

Independent variable values used for prediction.

keystr, optional

Name of the ID column.

Mandatory if data is not indexed, or the index of data contains multiple columns.

Defaults to the single index column of data if not provided.

featuresa list of str, optional

Names of the feature columns.

model_formatint or str, optional(deprecated)

0 or 'coefficient' : using coefficient table as model for prediction
1 or 'pmml' : using pmml table as model for prediction

Defaults to 'coefficient'.

Deprecated, not effective any more.

thread_ratiofloat, optional

Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.

Defaults to 0.

Returns:

DataFrame

Predicted values, structured as follows:

ID column, with same name and type as data 's ID column.

VALUE, type DOUBLE, representing predicted values.

Note

predict() will pass the pmml_ table to PAL as the model representation if there is a pmml_ table, or the coefficients_ table otherwise.

score(data, key=None, features=None, label=None)

Returns the coefficient of determination R2 of the prediction.

Parameters:

dataDataFrame

Data on which to assess model performance.

keystr, optional

Name of the ID column.

Mandatory if data is not indexed, or the index of data contains multiple columns.

Defaults to the single index column of data if not provided.

featuresa list of str, optional

Names of the feature columns.

labelstr, optional

Name of the dependent variable.

Defaults to the last non-ID column(this is not the PAL default).

Returns:

float: The coefficient of determination R2 of the prediction on the given data.

get_model_metrics()

Get the model metrics.

Returns:

DataFrame: The model metrics.

get_score_metrics()

Get the score metrics.

Returns:

DataFrame: The score metrics.

Inherited Methods from PALBase

Besides those methods mentioned above, the BiVariateGeometricRegression class also inherits methods from PALBase class, please refer to PAL Base for more details.