Alternating least squares (ALS) is a powerful matrix factorization algorithm for building both explicit and implicit feedback based recommender systems.

hanaml.ALS(
  data = NULL,
  key = NULL,
  used.cols = NULL,
  factors = NULL,
  lambda = NULL,
  max.iter = NULL,
  tol = NULL,
  exit.interval = NULL,
  implicit = NULL,
  linsolver = NULL,
  cg.max.iter = NULL,
  alpha = NULL,
  thread.ratio = NULL,
  resampling.method = NULL,
  evaluation.metric = NULL,
  fold.num = NULL,
  repeat.times = NULL,
  param.search.strategy = NULL,
  random.search.times = NULL,
  random.state = NULL,
  timeout = NULL,
  progress.indicator.id = NULL,
  parameter.range = NULL,
  parameter.values = NULL
)

Arguments

data

DataFrame
Input data for ALS model training. It must contain the following three columns:

  • user name/ID column

  • item name/ID column

  • column of user feedback for item

key

character, optional
Name of the ID column. If not provided, the data is assumed to have no ID column.
No default value.

used.cols

list/vector of character, optional
Specifies the three columns of data that are used for training ALS model.
Should arranged in the order of: user, item and feedback.
Otherwise, the list/vector must be named, shown as follows:

  • used.cols <- list(user = xxx, item = xxx, feedback = xxx)

Default to the first three non-ID columns if not provided.

factors

integer, optional
Number of factor vectors in the matrix decomposition model of ALS.
Defautls to 8.

lambda

double, optional
Amount of penalization appled to the L2 regularization of the decomposed factors.
Defaults to 1e-2.

max.iter

integer, optional
Maximum number of iterations for the ALS algorithm.
Defaults to 20.

tol

double, optional
Specfies the exit threshold, i.e. if the value of cost function is decreased less than this value since the last check, then the algorithm exits.
Should be no less than 0, where 0 means not checking the value of cost function and the algorithm only exits when reaching the maximum number of iterations.
Defaults to 0.

exit.interval

integer, optional
Specifies the interval between consecutive checking of the exit criterion(i.e. tolerance).
Larger number means fewer additional evaluations of the cost function.
Valid only when tol is nonzero.
Defaults to 5.

implicit

logical, optional
Specifies whether to train the ALS model implicitly(TRUE) or explicitly(FALSE).cr Default to FALSE.

linsolver

("cholesky", "cd"), optional
Specifies the solver for solving the corresponding linear systems in ALS model.
Defaults to "cholesky", while "cg" is recommended when factors is large.

cg.max.iter

integer, optional
Specifies the maximum number of iterations for solving a liear system using the "cg" solver.
Valid only when linsolver is "cg".
Defaults to 3.

alpha

numeric, optional
Specifies a value when computing the confidence level in implicit ALS.
Valid only when implicit is TRUE.
Defaults to 1.0.

thread.ratio

double, optional
Controls the proportion of available threads that can be used by this function.
The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates all available threads.
Values between 0 and 1 will use up to that percentage of available threads.Values outside this range are ignored.
Defaults to 0.

resampling.method

character, optional
specifies the resampling values form below list. Valid options include: "cv", "bootstrap".
If no value is specified for this parameter, neither model evaluation nor parameter selection is activated.

evaluation.metric

character, optional
Specifies the evaluation metric for model evaluation or parameter selection.
Currently the only valid option is "rmse".
Defaults to "rmse".

fold.num

integer, optional
Specifies the fold number for the cross-validation(cv). Mandatory and valid only when resampling.method is "cv".
Defautls to 1.

repeat.times

numeric, optional
Specifies the number of repeat times for resampling.
Defaults to 1.

param.search.strategy

c('grid', 'random'), optional
Specifies the method to activate parameter selection. If not specified, model selection shall not be triggered.

random.search.times

integer, optional
Specifies the number of times to randomly select candidate parameters for selection. Mandatory and valid only when param.search.strategy is 'random'.

random.state

integer, optional
Specifies the seed for random number generator.
0 means using current system time as the seed.

timeout

integer, optional
Specifies maximum running time for model evaluation or parameter selection in seconds. No timeout when 0 is specified.

progress.indicator.id

character, optional
Sets an ID of progress indicator for model evaluation or parameter selection.
No progress indicator is active if no value is provided.

parameter.range

list, optional
Specifies range of the following parameters for parameter selection:
factors, lambda, alpha.
Parameter range should be specified by 3 numbers in the form of c(start, step, end).
Examples:
parameter.range <- list(factors = c(10, 1, 20)).
If param.search.strategy is 'random', then step has no effect and thus can be omitted.

parameter.values

list, optional
Specifies values of the following parameters for parameter selection:
factors, lambda, alpha.

Value

A "ALS" object with the following attributes:

  • model.meta: DataFrame
    ALS model metadata content.

  • model.map: DataFrame
    ALS model map content.

  • model.factors: DataFrame
    ALS model decomposition factors.

  • iter.info: DataFrame
    Information of ALS iterations.

  • statistics: DataFrame
    Statistical information of the ALS model.

  • optim.param: DataFrame
    Optimal parameters selected. Avaliable only when parameter selection is triggered.

Examples

Input DataFrame data:

> data$Collect()
   USER     MOVIE RATING
1     A    Movie1    4.8
2     A    Movie2    4.0
3     A    Movie4    4.0
4     A    Movie5    4.0
5     A    Movie6    4.8
6     A    Movie8    3.8
7     A Bad_Movie    2.5
8     B    Movie2    4.8
......
35    E    Movie6    4.2
36    E    Movie7    3.5
37    E    Movie8    3.5

Call the function:

als <- hanaml.ALS(data = data,
                  factors = 2,
                  lambda = 1e-2,
                  max.iter = 20,
                  thread.ratio = 0,
                  random.state = 1)

Output:

> als$model.map$Collect()
   ID       MAP
1   0         A
2   1         B
3   2         C
4   3         D
5   4         E
6   5    Movie1
7   6    Movie2
8   7    Movie4
9   8    Movie5
10  9    Movie6
11 10    Movie8
12 11 Bad_Movie
13 12    Movie3
14 13    Movie7

> als$iter.info$Collect()
  ITERATION                  COST               RMSE
1        20   0.14724755464106934 0.1086315164152475

See also