Polynomial Regression

hanaml.PolynomialRegression is a R wrapper for SAP HANA PAL Polynomial Regression.

hanaml.PolynomialRegression(
  data = NULL,
  key = NULL,
  features = NULL,
  label = NULL,
  formula = NULL,
  degree = NULL,
  decomposition = NULL,
  adjusted.r2 = NULL,
  pmml.export = NULL,
  resampling.method = NULL,
  evaluation.metric = NULL,
  fold.num = NULL,
  repeat.times = NULL,
  param.search.strategy = NULL,
  random.search.times = NULL,
  random.state = NULL,
  timeout = NULL,
  progress.indicator.id = NULL,
  parameter.range = NULL,
  parameter.values = NULL
)

Arguments

data

DataFrame
DataFrame containting the data.

key

character, optional
Name of the ID column. If not provided, the data is assumed to have no ID column.
No default value.

features

character, optional
Name of the feature column.
If not provided, it defaults the first non-key, non-label column of data.

label

character, optional
Name of the column which specifies the dependent variable.
Defaults to the last column of data if not provided.

formula

formula type, optional
Formula to be used for model generation. format = label~<feature_list> e.g.: formula=CATEGORY~V1+V2+V3
You can either give the formula, or a feature and label combination, but do not provide both.
Defaults to NULL.

degree

integer
Degree of the polynomial model.

decomposition

c("LU", "QR", "SVD", "Cholesky"), optional
Specifies decomposition method(case-insensitive).

"LU": Doolittle decomposition.
"QR": QR decomposition.
"SVD": singular value decomposition.
"Cholesky": Cholesky decomposition.

Defaults to "QR".

adjusted.r2

logical, optional
If TRUE, include the adjusted R^2 value in the statistics table.
Defaults to FALSE.

pmml.export

c("no", "single-row", "multi-row"), optional
Controls whether to output a PMML representation of the model, and how to format the PMML.

"no": No PMML model.
"single-row": Exports a PMML model in a maximum of one row. Fails if the model doesn't fit in one row.
"multi-row": Exports a PMML model, splitting it across multiple rows if it doesn't fit in one.

Default to "no".

resampling.method

character, optional
Specifies the resampling values for model evaluation or parameter selection.
Valid options include: "cv", "bootstrap".
If no value is specified for this parameter, neither model evaluation nor parameter selection is activated.

evaluation.metric

character, optional
Specifies the evaluation metric for model evaluation or parameter selection.
Currently the only optional values is "RMSE".
Must be specified together with resampling.method to activate model evaluation or parameter selection.

fold.num

integer, optional
Specifies the fold number for the cross-validation(cv).
Mandatory and valid only when resampling.method is "cv".

repeat.times

numeric, optional
Specifies the number of repeat times for resampling.
Defaults to 1.

param.search.strategy

c("grid", "random"), optional
Specifies the method to activate parameter selection. If not specified, model selection shall not be triggered.

random.search.times

integer, optional
Specifies the number of times to randomly select candidate parameters for selection. Mandatory and valid only when param.search.strategy is "random".

random.state

integer, optional
Specifies the seed for random number generation, where 0 means current system time is used as seed, and other values are simply real seed values.

timeout

integer, optional
Specifies maximum running time for model evaluation or parameter selection in seconds. No timeout when 0 is specified.

progress.indicator.id

character, optional
Sets an ID of progress indicator for model evaluation or parameter selection.
No progress indicator is active if no value is provided.

parameter.range

named list/vector, optional
Specifies range of degree parameter for parameter selection:
Parameter range should be specified by 3 numbers in the form of c(start, step, end).
If param.search.strategy is "random", then step has no effect and thus can be omitted.

parameter.values

a named list/vector, optional
Specifies values of the degree parameter for parameter selection.

Value

coefficients : DataFrame
Fitted regression coefficients.
pmml : DataFrame
PMML model. Set to NULL if no PMML model is requested.
model : DataFrame
Model is used to save coefficients or PMML model. If PMML model is requested, model defaults to PMML model. Otherwise, it is coefficients.
fitted : DataFrame
Predicted dependent variable values for training data. Set to NULL if the training data has no row IDs.
statistics : DataFrame
Regression-related statistics, such as mean squared error.
optim.param : codeDataFrame
The selected optimal degree parameter.

Details

Polynomial regression is an approach to modeling the relationship between a scalar variable y and a variable denoted X. In polynomial regression, data is modeled using polynomial functions, and unknown model parameters are estimated from the data. Such models are called polynomial models.

Examples

Input DataFrame data:


>data$Collect()
  ID   Y X1
1  0   5  1
2  1  20  2
3  2  43  3
4  3  89  4
5  4 166  5
6  5 247  6
7  6 403  7

Call the function:


>pr <- hanaml.PolynomialRegression(data, key = "ID", formula= Y~X1,
                                   degree = 3L, pmml.export = "multi-row")

Output:


> pr$coefficients$Collect()
       VARIABLE_NAME COEFFICIENT_VALUE
1  __PAL_INTERCEPT__        -11.000000
2 X1__PAL_DELIMIT__1         17.250000
3 X1__PAL_DELIMIT__2         -3.416667
4 X1__PAL_DELIMIT__3          1.333333

Arguments

Value

Details

Examples

See also