BiVariateGeometricRegression
- class hana_ml.algorithms.pal.regression.BiVariateGeometricRegression(decomposition=None, adjusted_r2=None, pmml_export=None, thread_ratio=0.0)
Geometric regression is an approach used to model the relationship between a scalar variable y and a variable denoted X. In geometric regression, data is modeled using geometric functions, and unknown model parameters are estimated from the data. Such models are called geometric models.
- Parameters
- decomposition{'LU', 'QR', 'SVD', 'Cholesky'}, optional
Matrix factorization type to use. Case-insensitive.
'LU': LU decomposition.
'QR': QR decomposition.
'SVD': singular value decomposition.
'Cholesky': Cholesky(LDLT) decomposition.
Defaults to QR decomposition.
- adjusted_r2bool, optional
If true, include the adjusted R2 value in the statistics table.
Defaults to False.
- pmml_export{'no', 'single-row', 'multi-row'}, optional
Controls whether to output a PMML representation of the model, and how to format the PMML. Case-insensitive.
'no' or not provided: No PMML model.
'single-row': Exports a PMML model in a maximum of one row. Fails if the model doesn't fit in one row.
'multi-row': Exports a PMML model, splitting it across multiple rows if it doesn't fit in one.
Prediction does not require a PMML model.
- thread_ratiofloat, optional
Controls the proportion of available threads to use for fitting.
The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates up to all available threads.
Values between 0 and 1 will use that percentage of available threads.
Values outside this range tell PAL to heuristically determine the number of threads to use.
Defaults to 0.
Examples
>>> df.collect() ID Y X1 0 0 1.1 1 1 1 4.2 2 2 2 8.9 3 3 3 16.3 4 4 4 24 5
Training the model:
>>> gr = BiVariateGeometricRegression(pmml_export='multi-row') >>> gr.fit(data=df, key='ID')
Prediction:
>>> df2.collect() ID X1 0 0 1 1 1 2 2 2 3 3 3 4 4 4 5
>>> er.predict(data=df2, key='ID').collect() ID VALUE 0 0 1 1 1 3.9723699817481437 2 2 8.901666037549536 3 3 5.779723271893747 4 4 24.60086108408644
- Attributes
- coefficients_DataFrame
Fitted regression coefficients.
- pmml_DataFrame
PMML model. Set to None if no PMML model was requested.
- fitted_DataFrame
Predicted dependent variable values for training data. Set to None if the training data has no row IDs.
- statistics_DataFrame
Regression-related statistics, such as mean squared error.
Methods
fit
(data[, key, features, label])Fit regression model based on training data.
predict
(data[, key, features, model_format, ...])Predict dependent variable values based on fitted model.
score
(data[, key, features, label])Returns the coefficient of determination R2 of the prediction.
- fit(data, key=None, features=None, label=None)
Fit regression model based on training data.
- Parameters
- dataDataFrame
Training data.
- keystr, optional
Name of the ID column.
If
key
is not provided, then:if
data
is indexed by a single column, thenkey
defaults to that index column;otherwise, it is assumed that
data
contains no ID column.
- featureslist of str, optional
Names of the feature columns.
- labelstr, optional
Name of the dependent variable.
Defaults to the last non-ID column(this is not the PAL default).
- Returns
- Fitted object.
- predict(data, key=None, features=None, model_format=None, thread_ratio=0.0)
Predict dependent variable values based on fitted model.
- Parameters
- dataDataFrame
Independent variable values used for prediction.
- keystr, optional
Name of the ID column.
Mandatory if
data
is not indexed, or the index ofdata
contains multiple columns.Defaults to the single index column of
data
if not provided.- featureslist of str, optional
Names of the feature columns.
- model_formatint or str, optional(deprecated)
0 or 'coefficient' : using coefficient table as model for prediction
1 or 'pmml' : using pmml table as model for prediction
Defaults to 'coefficient'.
Deprecated, not effective any more.
- thread_ratiofloat, optional
Controls the proportion of available threads to use for prediction.
The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates up to all available threads.
Values between 0 and 1 will use that percentage of available threads.
Values outside this range tell PAL to heuristically determine the number of threads to use.
Defaults to 0.
- Returns
- DataFrame
Predicted values, structured as follows:
ID column, with same name and type as
data
's ID column.VALUE, type DOUBLE, representing predicted values.
Note
predict() will pass the
pmml_
table to PAL as the model representation if there is apmml_
table, or thecoefficients_
table otherwise.
- score(data, key=None, features=None, label=None)
Returns the coefficient of determination R2 of the prediction.
- Parameters
- dataDataFrame
Data on which to assess model performance.
- keystr, optional
Name of the ID column.
Mandatory if
data
is not indexed, or the index ofdata
contains multiple columns.Defaults to the single index column of
data
if not provided.- featureslist of str, optional
Names of the feature columns.
- labelstr, optional
Name of the dependent variable.
Defaults to the last non-ID column(this is not the PAL default).
- Returns
- float
The coefficient of determination R2 of the prediction on the given data.
- property fit_hdbprocedure
Returns the generated hdbprocedure for fit.
- property predict_hdbprocedure
Returns the generated hdbprocedure for predict.