hanaml.FRM is an R wrapper for SAP HANA PAL Factorized Polynomial Regression Models(FRM).

hanaml.FRM(
  data = NULL,
  key = NULL,
  user.info = NULL,
  item.info = NULL,
  categorical.variable = NULL,
  user.categorical.variable = NULL,
  item.categorical.variable = NULL,
  solver = NULL,
  factor.num = NULL,
  init.variance = NULL,
  random.state = NULL,
  learning.rate = NULL,
  linear.lambda = NULL,
  poly2.lambda = NULL,
  max.iter = NULL,
  sgd.tol = NULL,
  sgd.exit.interval = NULL,
  momentum = NULL,
  thread.ratio = NULL,
  resampling.method = NULL,
  evaluation.metric = NULL,
  fold.num = NULL,
  param.search.strategy = NULL,
  repeat.times = NULL,
  progress.indicator.id = NULL,
  random.search.times = NULL,
  timeout = NULL,
  parameter.values = NULL,
  parameter.range = NULL
)

Arguments

data

DataFrame
DataFrame containting data of user-item interaction and global side features, structured as follows:

  • ID

  • USER ID column

  • ITEM ID column

  • Side feature columns

  • feedback/rationg column

key

character, optional
Name of the ID column. If not provided, the data is assumed to have no ID column.
No default value.

user.info

DataFrame
DataFrame containting information of side features about user, structured as follows:

  • USER ID column

  • Side feature columns

item.info

DataFrame
DataFrame containting information of side features about item, structured as follows:

  • ITEM ID column

  • Side feature columns

categorical.variable

character or list/vector of characters, optional
Indicates features should be treated as categorical variable.
The default behavior is dependent on what input is given:

  • "VARCHAR" and "NVARCHAR": categorical

  • "INTEGER" and "DOUBLE": continuous.

VALID only for variables of "INTEGER" type, omitted otherwise.
No default value.

user.categorical.variable

list/vector of characters, optional
Name of columns user.info that correspond to categorical variable even the data type is INTEGER.
By default, 'VARCHAR' or 'NVARCHAR' is category variable, and 'INTEGER' or 'DOUBLE' is continuous variable.

item.categorical.variable

list/vector of characters, optional
Name of columns item.info that correspond to categorical variable even the data type is INTEGER.
By default, 'VARCHAR' or 'NVARCHAR' is category variable, and 'INTEGER' or 'DOUBLE' is continuous variable.

solver

c("sgd", "momentum", "nag", "adagrad"), optional
Specifies optimization solver used to train the model

  • "sgd" Stochastic Gradient Descent solver

  • "momentum" Momentum.

  • "nag" Nesterov Accelerated Gradient.

  • "adagrad" Adaptive gradient algorithm.

Defaults to "sgd".

factor.num

integer, optional
length of factor vectors.
Defaults to 8.

init.variance

numeric, optional
Variance of the normal distribution used to initialize the model parameters.
Defaults to 1e-2.

random.state

numeric, optional
Specifies the seed for random number generation, where 0 means current system time is used as seed, and other values are simply real seed values.
Defaults to 0.

learning.rate

numeric, optional
Secifies the learning rate/ step size for optimization process. If you set it to the default value 0, the function chooses the step size automatically, based on a small part of the dataset.
Defaults to 0.

linear.lambda

numeric, optional
Specifies the penalization assigned to the L2 regularization term of linear weights.
Defaults to 1e-10.

poly2.lambda

numeric, optional
Specifies the penalization assigned to the L2 regularization term of quadratic factors.
Defaults to 1e-8.

max.iter

integer, optional
Specifies the maximum number of iterations for optimization process.
Defaults to 50.

sgd.tol

numeric, optional
Specifies the stop creteria. The algorithm exits when the cost function has not decreased more than sgd.tol in sgd.exit.interval steps.
Defaults to 1e-5.

sgd.exit.interval

numeric, optional
Specifies the stop creteria. The algorithm exits when the cost function has not decreased more than sgd.tol in sgd.exit.interval steps.
Defaults to 5.

momentum

numeric, optional
Specifies the momentum value for Momemtum or NAG method. Only valid when solver is "momentum" or "nag".
Defaults to 0.9.

thread.ratio

double, optional
Controls the proportion of available threads that can be used by this function.
The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates all available threads.
Values between 0 and 1 will use up to that percentage of available threads.Values outside this range are ignored.
Defaults to 0.

resampling.method

{"cv", "bootstrap"}, optional
specifies the resampling values to perform model evaluation and parameter selection.
If no value is specified for this parameter, neither model evaluation nor parameter selection is activated.

evaluation.metric

{"rmse"}, optional
Specifies the evaluation metric for model evaluation or parameter selection, only RMSE is supported.
If not specified, neither model evaluation nor parameter selection is activated.

fold.num

integer, optional
Specifies the fold number for the cross-validation(cv). Mandatory and valid only when resampling.method is "cv".

param.search.strategy

{'grid', 'random'}, optional
Specifies the method to activate parameter selection. If not specified, parameter selection shall not be triggered.

repeat.times

numeric, optional
Specifies the number of repeat times for resampling.
Defaults to 1.

progress.indicator.id

character, optional
Sets an ID of progress indicator for model evaluation or parameter selection.
No progress indicator is active if no value is provided.

random.search.times

integer, optional
Specifies the number of times to randomly select candidate parameters for selection.
Mandatory and valid when param.search.strategy is set to "random".

timeout

integer, optional
Specifies maximum running time for model evaluation or parameter selection in seconds. No timeout when 0 is specified.

parameter.values

named list/vector, optional
Specifies values of the following parameters for parameter selection:
factor.num, linear.lambda,regularization, momentum.

parameter.range

named list/vector, optional
Specifies range of the following parameters for parameter selection:
factor.num, linear.lambda,regularization, momentum.
Parameter range should be specified by 3 numbers in the form of c(start, step, end).
Examples:
parameter.range <- list(factor.num = c(3, 1, 6)).

Value

A "FRM" object with the following attributes:

  • model.meta.data: DataFrame meta data of the trained model.

  • model: DataFrame weights of the trained model

  • model.factors: DataFrame Factors of the trained model.

  • iter.info: DataFrame recorded information while the optimazation process.

  • stat: DataFrame Statistics for model-evaluation/ parameter-selection. Available only when model-evaluation/parameter selection is enabled.

  • optim.param: DataFrame shows the selected optimal parameters. Available only when model-evaluation/parameter selection is enabled.

Details

Factorized Polynomial Regression Models has been proven to be a powerfuk tool for prediction applications such as recommendation. It combines the advantages of polynomial regression models with factorization models. Unlike SVM, it allows reliable parameter estimation under very sparse data, where just a few observations for higher-order effects are available. Due to the factorization of those higher-order interactions, FMs can be calculated with linear complexity.

  • each user-item rating/transaction. For example, location or time of a movie was rated by or lent to a user

  • each user, e.g. gender, age, education, etc

  • each item, such as genre of a movie

Examples

> data$Head(5)$Collect()
 ID USER    MOVIE TIMESTAMP RATING
1 1    A   Movie1         3    4.8
2 2    A   Movie2         3    4.0
3 3    A   Movie4         1    4.0
4 4    A   Movie5         2    4.0
5 5    A   Movie6         3    4.8

> user.info$Collect()
     USER     USER_SIDE_FEATURE
1      NA                    NA

> item.info$Head(5)$Collect()
   MOVIE  GENRES
1 Movie1  Sci-Fi
2 Movie2  Drama,Romance
3 Movie3  Drama,Sci-Fi
4 Movie4  Crime,Drama
5 Movie5  Crime,Drama

Call the function:

FM <- hanaml.FRM(data = data,
                 user.info = user.info,
                 item.info = item.info,
                 categorical.variable = "TIMESTAMP",
                 resampling.method = "cv",
                 solver = "momentum",
                 learning.rate = 0,
                 max.iter = 100,
                 param.search.strategy = "grid",
                 evaluation.metric = "rmse",
                 fold.num = 5, repeat.times = 1, timeout = 0,
                 progress.indicator.id = "PAL_FRM",
                 thread.ratio = 0.5, random.state = 1,
                 parameter.range = list(factor.num= c(1,1,3)),
                 parameter.values = list(linear.lambda = c(1e-6, 1e-8, 1e-10),
                                         poly2.lambda = c(1e-4, 1e-6, 1e-8),
                                         momentum = c(0.8, 0.9)))

Output:

> FM$model$Head(5)$Collect()
 ID MAP      WEIGHT
1 0   A -0.01932846
2 1   B  0.73047553
3 2   C -0.22821216
4 3   D  0.05358953
5 4   E  0.03182115

> FM$optim.param$Collect()
             PARAM_NAME INT_VALUE numeric_VALUE STRING_VALUE
1              MOMENTUM        NA         9e-01         <NA>
2        REGULARIZATION        NA         1e-08         <NA>
3 LINEAR_REGULARIZATION        NA         1e-06         <NA>
4         FACTOR_NUMBER         1            NA         <NA>

See also