BiVariateNaturalLogarithmicRegression
- class hana_ml.algorithms.pal.regression.BiVariateNaturalLogarithmicRegression(decomposition=None, adjusted_r2=None, pmml_export=None, thread_ratio=0.0)
Bi-variate natural logarithmic regression is an approach to modeling the relationship between a scalar variable y and one variable denoted X. In natural logarithmic regression, data is modeled using natural logarithmic functions, and unknown model parameters are estimated from the data. Such models are called natural logarithmic models.
- Parameters
- decomposition{'LU', 'QR', 'SVD', 'Cholesky'}, optional
Matrix factorization type to use. Case-insensitive.
'LU': LU decomposition.
'QR': QR decomposition.
'SVD': singular value decomposition.
'Cholesky': Cholesky(LDLT) decomposition.
Defaults to QR decomposition.
- adjusted_r2bool, optional
If true, include the adjusted R2 value in the statistics table.
Defaults to False.
- pmml_export{'no', 'single-row', 'multi-row'}, optional
Controls whether to output a PMML representation of the model, and how to format the PMML. Case-insensitive.
'no' or not provided: No PMML model.
'single-row': Exports a PMML model in a maximum of one row. Fails if the model doesn't fit in one row.
'multi-row': Exports a PMML model, splitting it across multiple rows if it doesn't fit in one.
Prediction does not require a PMML model.
- thread_ratiofloat, optional
Controls the proportion of available threads to use for fitting.
The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates up to all available threads.
Values between 0 and 1 will use that percentage of available threads.
Values outside this range tell PAL to heuristically determine the number of threads to use.
Does not affect fitting.
Defaults to 0.
Examples
>>> df.collect() ID Y X1 0 0 10 1 1 1 80 2 2 2 130 3 3 3 180 5 4 4 190 6
Training the model:
>>> gr = BiVariateNaturalLogarithmicRegression(pmml_export='multi-row') >>> gr.fit(data=df, key='ID')
Prediction:
>>> df2.collect() ID X1 0 0 1 1 1 2 2 2 3 3 3 4 4 4 5
>>> er.predict(data=df2, key='ID').collect() ID VALUE 0 0 14.86160299 1 1 82.9935329364932 2 2 122.8481570569525 3 3 151.1254628829864 5 4 173.05904529166017
- Attributes
- coefficients_DataFrame
Fitted regression coefficients.
- pmml_DataFrame
PMML model. Set to None if no PMML model was requested.
- fitted_DataFrame
Predicted dependent variable values for training data. Set to None if the training data has no row IDs.
- statistics_DataFrame
Regression-related statistics, such as mean squared error.
Methods
fit
(data[, key, features, label])Fit regression model based on training data.
predict
(data[, key, features, model_format, ...])Predict dependent variable values based on fitted model.
score
(data[, key, features, label])Returns the coefficient of determination R2 of the prediction.
- fit(data, key=None, features=None, label=None)
Fit regression model based on training data.
- Parameters
- dataDataFrame
Training data.
- keystr, optional
Name of the ID column.
If
key
is not provided, then:if
data
is indexed by a single column, thenkey
defaults to that index column;otherwise, it is assumed that
data
contains no ID column.
- featureslist of str, optional
Names of the feature columns.
- labelstr, optional
Name of the dependent variable.
Defaults to the last non-ID column(this is not the PAL default).
- Returns
- Fitted object.
- predict(data, key=None, features=None, model_format=None, thread_ratio=0.0)
Predict dependent variable values based on fitted model.
- Parameters
- dataDataFrame
Independent variable values used for prediction.
- keystr, optional
Name of the ID column.
Mandatory if
data
is not indexed, or the index ofdata
contains multiple columns.Defaults to the single index column of
data
if not provided.- featureslist of str, optional
Names of the feature columns.
- model_formatint or str, optional(deprecated)
0 or 'coefficient' : using coefficient table as model for prediction
1 or 'pmml' : using pmml table as model for prediction
Defaults to 'coefficient'.
Deprecated, not effective any more.
- thread_ratiofloat, optional
Controls the proportion of available threads to use for prediction.
The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates up to all available threads.
Values between 0 and 1 will use that percentage of available threads.
Values outside this range tell PAL to heuristically determine the number of threads to use.
Does not affect fitting.
Defaults to 0.
- Returns
- DataFrame
Predicted values, structured as follows:
ID column, with same name and type as
data
's ID column.VALUE, type DOUBLE, representing predicted values.
Note
predict() will pass the
pmml_
table to PAL as the model representation if there is apmml_
table, or thecoefficients_
table otherwise.
- score(data, key=None, features=None, label=None)
Returns the coefficient of determination R2 of the prediction.
- Parameters
- dataDataFrame
Data on which to assess model performance.
- keystr, optional
Name of the ID column.
Mandatory if
data
is not indexed, or the index ofdata
contains multiple columns.Defaults to the single index column of
data
if not provided.- featureslist of str, optional
Names of the feature columns.
- labelstr, optional
Name of the dependent variable.
Defaults to the last non-ID column(this is not the PAL default).
- Returns
- float
The coefficient of determination R2 of the prediction on the given data.
- property fit_hdbprocedure
Returns the generated hdbprocedure for fit.
- property predict_hdbprocedure
Returns the generated hdbprocedure for predict.