Alternating Least Squares

Alternating least squares (ALS) is a powerful matrix factorization algorithm for building both explicit and implicit feedback based recommender systems.

hanaml.ALS(
  data = NULL,
  key = NULL,
  used.cols = NULL,
  factors = NULL,
  lambda = NULL,
  max.iter = NULL,
  tol = NULL,
  exit.interval = NULL,
  implicit = NULL,
  linsolver = NULL,
  cg.max.iter = NULL,
  alpha = NULL,
  thread.ratio = NULL,
  resampling.method = NULL,
  evaluation.metric = NULL,
  fold.num = NULL,
  repeat.times = NULL,
  param.search.strategy = NULL,
  random.search.times = NULL,
  random.state = NULL,
  timeout = NULL,
  progress.indicator.id = NULL,
  parameter.range = NULL,
  parameter.values = NULL,
  reduction.rate = NULL,
  min.resource.rate = NULL,
  aggressive.elimination = NULL
)

Arguments

data

DataFrame
Input data for ALS model training. It must contain the following three columns:

user name/ID column.
item name/ID column.
column of user feedback for item.

key

character, optional
Name of the ID column. If not provided, the data is assumed to have no ID column.
No default value.

used.cols

list/vector of character, optional
Specifies the three columns of data that are used for training ALS model.
Should arranged in the order of: user, item and feedback.
Otherwise, the list/vector must be named, shown as follows:

used.cols <- list(user = xxx, item = xxx, feedback = xxx)

Default to the first three non-ID columns if not provided.

factors

integer, optional
Number of factor vectors in the matrix decomposition model of ALS.
Defautls to 8.

lambda

double, optional
Amount of penalization appled to the L2 regularization of the decomposed factors.
Defaults to 1e-2.

max.iter

integer, optional
Maximum number of iterations for the ALS algorithm.
Defaults to 20.

tol

double, optional
Specfies the exit threshold, i.e. if the value of cost function is decreased less than this value since the last check, then the algorithm exits.
Should be no less than 0, where 0 means not checking the value of cost function and the algorithm only exits when reaching the maximum number of iterations.
Defaults to 0.

exit.interval

integer, optional
Specifies the interval between consecutive checking of the exit criterion(i.e. tolerance).
Larger number means fewer additional evaluations of the cost function.
Valid only when tol is nonzero.
Defaults to 5.

implicit

logical, optional
Specifies whether to train the ALS model implicitly(TRUE) or explicitly(FALSE).cr Default to FALSE.

linsolver

c("cholesky", "cd"), optional
Specifies the solver for solving the corresponding linear systems in ALS model.
Defaults to "cholesky", while "cg" is recommended when factors is large.

cg.max.iter

integer, optional
Specifies the maximum number of iterations for solving a linear system using the "cg" solver.
Valid only when linsolver is "cg".
Defaults to 3.

alpha

numeric, optional
Specifies a value when computing the confidence level in implicit ALS.
Valid only when implicit is TRUE.
Defaults to 1.0.

thread.ratio

double, optional
Controls the proportion of available threads that can be used by this function.
The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates all available threads.
Values between 0 and 1 will use up to that percentage of available threads.Values outside this range are ignored.
Defaults to 0.

resampling.method

character, optional
specifies the resampling method for model evaluation or parameter selection.
Valid options include:
"cv", "bootstrap", "cv_sha", "bootstrap_sha", "cv_hyperband", "bootstrap_hyperband".
If no value is specified for this parameter, neither model evaluation nor parameter selection is activated.

evaluation.metric

character, optional
Specifies the evaluation metric for model evaluation or parameter selection.
Must be specified together with "resampling.method" to activate model evaluation or parameter selection.
Currently the only valid option is "rmse".
Defaults to "rmse".

fold.num

integer, optional
Specifies the fold number for the cross-validation(cv).
Mandatory and valid only when resampling.method is specified with prefix "cv"(i.e. "cv"， "cv_sha" and "cv_hyperband").
Defaults to 1.

repeat.times

numeric, optional
Specifies the number of repeat times for resampling.
Defaults to 1.

param.search.strategy

c('grid', 'random'), optional
Specifies the parameter search strategy to activate parameter selection.
Defaults to "random" and cannot be changed if resampling.method is either "cv_hyperband" or "bootstrap_hyperband", otherwise no default value.

random.search.times

integer, optional
Specifies the number of times to randomly select candidate parameters for selection. Mandatory and valid only when param.search.strategy is "random".

random.state

integer, optional
Specifies the seed for random number generator.
0 means using current system time as the seed.

timeout

integer, optional
Specifies maximum running time for model evaluation or parameter selection in seconds. No timeout when 0 is specified.

progress.indicator.id

character, optional
Sets an ID of progress indicator for model evaluation or parameter selection.
No progress indicator is active if no value is provided.

parameter.range

list, optional
Specifies range of the following parameters for parameter selection:
factors, lambda, alpha.
Parameter range should be specified by 3 numbers in the form of c(start, step, end).
Examples:
parameter.range <- list(factors = c(10, 1, 20)).
If param.search.strategy is 'random', then step has no effect and thus can be omitted.

parameter.values

list, optional
Specifies values of the following parameters for parameter selection:
factors, lambda, alpha.

reduction.rate

numeric, optional
Specifies the reduction rate of available size of hyper-parameter candidates.
For each round, the available parameter candidate size will be divided by value of this parameter. Thus valid value for this parameter must be greater than 1.0
Valid only when parameter selection is activated and resampling.method is specified with suffix "sha" or "hyperband".
Defaults to 3.0.

min.resource.rate

numeric, optional
Specifies the minimum resource rate that should be used in SHA or hyperband iteration.
Valid only when parameter selection is activated and resampling.method is specified with suffix "sha" or "hyperband".
Defaults to 0.0.

aggressive.elimination

logical, optional
Specifies whether to perform aggressive elimination behavior for successive-halving algorithm or not.
When set to TRUE, it will eliminate more parameter candidates than expected(defined via reduction.rate).
This can enhance the run-time performance but could result in sub-optimal hyper-parameter candidate.
Valid only when resampling.method is specified with suffix "sha". Defaults to FALSE.

Value

An R6 object of class "ALS" with the following attributes and methods:
Attributes

model.meta: DataFrame
ALS model metadata content.
model.map: DataFrame
ALS model map content.
model.factors: DataFrame
ALS model decomposition factors.
iter.info: DataFrame
Information of ALS iterations.
statistics: DataFrame
Statistical information of the ALS model.
optim.param: DataFrame
Optimal parameters selected. Avaliable only when parameter selection is triggered.

Methods

CreateModelState(model=NULL, algorithm=NULL, func=NULL, state.description="ModelState", force=FALSE)

Usage:


   > als <- hanaml.ALS(data=df)
   > als$CreateModelState()

Arguments:

model: DataFrame
DataFrame containing the model for parsing.
Defaults to self$model.
algorithm: character
Specifies the PAL algorithm associated with model.
Defaults to self$pal.algorithm.
func: character
Specifies the functionality for Unified Classification/Regression.
Valid only for object instance of R6Class "UnifiedClassification" or "UnifiedRegression".
Defaults to self$func.
state.description: character
A summary string for the generated model state.
Defaults to "ModelState".
force: logic
Specifies whether or not the replace existing state for model.
Defaults to FALSE.

After calling this method, an attribute state that contains the parsed info for model shall be assigned to the corresponding R6 object.

DeleteModelState(state=NULL)

Usage:
Assuming we have trained a hanaml model and created its model state, like the following:


   > als <- hanaml.ALS(data=df)
   > als$CreateModelState()

After using the model state for real-time scoring, we can delete the state by calling:


   > als$DelateModelState()

Arguments:

state: DataFrame
DataFrame containing the state info.
Defaults to self$state.

After calling this method, the specified model state shall be cleaned up and associated memory be released.

Examples

Input DataFrame data:


> data$Collect()
   USER     MOVIE RATING
1     A    Movie1    4.8
2     A    Movie2    4.0
3     A    Movie4    4.0
4     A    Movie5    4.0
5     A    Movie6    4.8
6     A    Movie8    3.8
7     A Bad_Movie    2.5
8     B    Movie2    4.8
......
35    E    Movie6    4.2
36    E    Movie7    3.5
37    E    Movie8    3.5

Call the function:

als <- hanaml.ALS(data = data,
                  factors = 2,
                  lambda = 1e-2,
                  max.iter = 20,
                  thread.ratio = 0,
                  random.state = 1)

Output:


> als$model.map$Collect()
   ID       MAP
1   0         A
2   1         B
3   2         C
4   3         D
5   4         E
6   5    Movie1
7   6    Movie2
8   7    Movie4
9   8    Movie5
10  9    Movie6
11 10    Movie8
12 11 Bad_Movie
13 12    Movie3
14 13    Movie7

> als$iter.info$Collect()
  ITERATION                  COST               RMSE
1        20   0.14724755464106934 0.1086315164152475

Arguments

Value

Examples

See also