Field-Aware Factorization Machine for classification

hanaml.FFMClassifier is an R wrapper for SAP HANA PAL FFM for classification.

hanaml.FFMClassifier(
  data = NULL,
  key = NULL,
  features = NULL,
  label = NULL,
  categorical.variable = NULL,
  delimiter = NULL,
  normalize = NULL,
  include.constant = NULL,
  include.linear = NULL,
  early.stop = NULL,
  factor.num = NULL,
  train.ratio = NULL,
  learning.rate = NULL,
  random.state = NULL,
  max.iter = NULL,
  linear.lambda = NULL,
  poly2.lambda = NULL,
  sgd.tol = NULL,
  sgd.exit.interval = NULL,
  handle.missing = NULL
)

Arguments

data	`DataFrame` DataFrame containting data the data for factorization.
key	`character, optional` Name of the ID column. If not provided, the data is assumed to have no ID column. No default value.
features	`character of list of characters, optional` Name of feature columns. If not provided, it defaults all non-key, non-label columns of data.
label	`character, optional` Name of the column which specifies the dependent variable. Defaults to the last column of data if not provided.
categorical.variable	`character or list/vector of characters, optional` Indicates features should be treated as categorical variable. The default behavior is dependent on what input is given: "VARCHAR" and "NVARCHAR": categorical "INTEGER" and "DOUBLE": continuous. VALID only for variables of "INTEGER" type, omitted otherwise. No default value.
delimiter	`character, optional` The delimiter to separate string features. For example, "China, USA" indicates two feature values "China" and "USA". Valid only for string feature. Default to "," (comma)
normalize	`logical, optional` Specifies whether to normalize each instance so that its L1 norm is 1. Defaults to TRUE.
include.constant	`logical, optional` Specifies whether or not to include the constant part in FFM model. Defaults to TRUE.
include.linear	`logical, optional` Specifies whether or not to include the linear weights in FFM model. Defaults to TRUE.
early.stop	`logical, optional` Specifies whether or not to early stop the SGD optimisation. Always TRUE, if `train.ratio` is less than 1. Defaults to TRUE.
factor.num	`integer, optional` length of factor vectors. Defaults to 4
train.ratio	`double, optional` The ratio of training data set, and the remaining data set for validation. For example, 0.8 indicates that 80 0.8 if number of instances not less than 40, 1.0 otherwise.
learning.rate	`double, optional` Secifies the learning rate/ step size for optimization process. Defaults to 0.2.
random.state	`double, optional` Specifies the seed for random number generation, where 0 means current system time is used as seed, and other values are simply real seed values. Defaults to 0.
max.iter	`integer, optional` Specifies the maximum number of iterations for optimization process. Defaults to 20.
linear.lambda	`double, optional` Specifies the penalization assigned to the L2 regularization term for linear weights. Defaults to 1e-5.
poly2.lambda	`double, optional` Specifies the penalization assigned to the L2 regularization term for quadratic factors. Defaults to 1e-5.
sgd.tol	`double, optional` Specifies the stopping creteria for SGD algorithm. The algorithm exits when the cost function has not decreased more than sgd.tol in sgd.exit.interval steps. Defaults to 1e-5.
sgd.exit.interval	`double, optional` Specifies the stop creteria for SGD algorithm. The algorithm exits when the cost function has not decreased more than sgd.tol in sgd.exit.interval steps. Defaults to 5.
handle.missing	`c("remove", "replace"), optional` Specifies how to handle missing features of `data`: `"remove"` remove missing rows `"replace"` replace missing rows with 0

Value

A "FFMClassifier" object with the following attributes:

meta: DataFrame
meta data of the trained model.
coef: DataFrame
coefficient of the trained model
stats: DataFrame
statistical information about the trained model.

Details

FFM has been proven to be a powerful tool for CTR and CVR prediction task. Based on FM models that reduce weights for sparse higher-order interactions to vectors using matrix factorization, the Field-Aware Factorization Machine introduces the concept of field, with which we represent a group of similar features, e.g., the field of user properties includes gender, age, occupation, etc.
By making factor vectors related not only to features but also to fields, the model has to learn a vector representation for each field. By doing so, we increase the complexity of the model to O(kn^2) where n is the number of data, and k is the factor number, i.e., length of the factor vectors.
In practice, we consider features spanned from the same categorical variable as of the same field. It is noted that FFM is most suited to categorical features. A numeric feature is either regarded as a single field or discretized to categorical. If all features are numeric and treated as every single feature, which means each field consists of only one feature, FFM degenerates to FM.
FFM can be applied to a variety of prediction tasks, for example, binary classification, regression, and ranking.

Examples

> data$Head(5)
  USER  MOVIE TIMESTAMP       CTR
1    A Movie1         3     Click
2    A Movie2         3     Click
3    A Movie4         1 Not click
4    A Movie5         2     Click
5    A Movie6         3     Click

Call the function:

> FFMClsf <- hanaml.FFMClassifier(data = data, task = "ranking",
                                  categorical.variable = "TIMESTAMP",
                                  delimiter = ",", factor.num = 4,
                                  early.stop = TRUE, learning.rate = 0.2,
                                  max.iter = 20, train.ratio = 0.8,
                                  linear.lambda = 1e-5,
                                  poly2.lambda = 1e-6, random.state = 1)

Output:

> FFMClsf$coefficient$Head(5)
  COEFF_INDEX FEATURE FIELD  K COEFFICIENT
1           0       c  <NA> NA -0.03166240
2           1  USER:A  <NA> NA -0.13690224
3           2  USER:B  <NA> NA -0.04620829
4           3  USER:C  <NA> NA  0.10801253
5           4  USER:D  <NA> NA -0.04806942

Arguments

Value

Details

Examples

See also