Alternating least squares (ALS) is a powerful matrix factorization algorithm for
building both explicit and implicit feedback based recommender systems.
hanaml.ALS(
data = NULL,
key = NULL,
used.cols = NULL,
factors = NULL,
lambda = NULL,
max.iter = NULL,
tol = NULL,
exit.interval = NULL,
implicit = NULL,
linsolver = NULL,
cg.max.iter = NULL,
alpha = NULL,
thread.ratio = NULL,
resampling.method = NULL,
evaluation.metric = NULL,
fold.num = NULL,
repeat.times = NULL,
param.search.strategy = NULL,
random.search.times = NULL,
random.state = NULL,
timeout = NULL,
progress.indicator.id = NULL,
parameter.range = NULL,
parameter.values = NULL
)
Arguments
| data |
DataFrame
Input data for ALS model training. It must contain the following three columns:
|
| key |
character, optional
Name of the ID column.
If not provided, the data is assumed to have no ID column.
No default value.
|
| used.cols |
list/vector of character, optional
Specifies the three columns of data that are used for training ALS model.
Should arranged in the order of: user, item and feedback.
Otherwise, the list/vector must be named, shown as follows:
Default to the first three non-ID columns if not provided. |
| factors |
integer, optional
Number of factor vectors in the matrix decomposition model of ALS.
Defautls to 8.
|
| lambda |
double, optional
Amount of penalization appled to the L2 regularization of the decomposed factors.
Defaults to 1e-2.
|
| max.iter |
integer, optional
Maximum number of iterations for the ALS algorithm.
Defaults to 20.
|
| tol |
double, optional
Specfies the exit threshold, i.e. if the value of cost function is
decreased less than this value since the last check, then the algorithm exits.
Should be no less than 0, where 0 means not checking the value of cost function and
the algorithm only exits when reaching the maximum number of iterations.
Defaults to 0.
|
| exit.interval |
integer, optional
Specifies the interval between consecutive checking of the exit criterion(i.e. tolerance).
Larger number means fewer additional evaluations of the cost function.
Valid only when tol is nonzero.
Defaults to 5.
|
| implicit |
logical, optional
Specifies whether to train the ALS model implicitly(TRUE) or explicitly(FALSE).cr
Default to FALSE.
|
| linsolver |
("cholesky", "cd"), optional
Specifies the solver for solving the corresponding linear systems in ALS model.
Defaults to "cholesky", while "cg" is recommended when factors is large.
|
| cg.max.iter |
integer, optional
Specifies the maximum number of iterations for solving a liear system using the "cg" solver.
Valid only when linsolver is "cg".
Defaults to 3.
|
| alpha |
numeric, optional
Specifies a value when computing the confidence level in implicit ALS.
Valid only when implicit is TRUE.
Defaults to 1.0.
|
| thread.ratio |
double, optional
Controls the proportion of available threads that can be used by this
function.
The value range is from 0 to 1, where 0 indicates a single thread,
and 1 indicates all available threads.
Values between 0 and 1 will use up to
that percentage of available threads.Values outside this
range are ignored.
Defaults to 0.
|
| resampling.method |
character, optional
specifies the resampling values form below list.
Valid options include: "cv", "bootstrap".
If no value is specified for this parameter, neither model evaluation
nor parameter selection is activated.
|
| evaluation.metric |
character, optional
Specifies the evaluation metric for model evaluation or parameter selection.
Currently the only valid option is "rmse".
Defaults to "rmse".
|
| fold.num |
integer, optional
Specifies the fold number for the cross-validation(cv).
Mandatory and valid only when resampling.method is "cv".
Defautls to 1.
|
| repeat.times |
numeric, optional
Specifies the number of repeat times for resampling.
Defaults to 1.
|
| param.search.strategy |
c('grid', 'random'), optional
Specifies the method to activate parameter selection.
If not specified, model selection shall not be triggered.
|
| random.search.times |
integer, optional
Specifies the number of times to randomly select candidate parameters for selection.
Mandatory and valid only when param.search.strategy is 'random'.
|
| random.state |
integer, optional
Specifies the seed for random number generator.
0 means using current system time as the seed.
|
| timeout |
integer, optional
Specifies maximum running time for model evaluation or parameter selection in seconds.
No timeout when 0 is specified.
|
| progress.indicator.id |
character, optional
Sets an ID of progress indicator for model evaluation or parameter selection.
No progress indicator is active if no value is provided.
|
| parameter.range |
list, optional
Specifies range of the following parameters for parameter selection:
factors, lambda, alpha.
Parameter range should be specified by 3 numbers in the form of c(start, step, end).
Examples:
parameter.range <- list(factors = c(10, 1, 20)).
If param.search.strategy is 'random', then step has no effect
and thus can be omitted.
|
| parameter.values |
list, optional
Specifies values of the following parameters for parameter selection:
factors, lambda, alpha.
|
Value
A "ALS" object with the following attributes:
model.meta: DataFrame
ALS model metadata content.
model.map: DataFrame
ALS model map content.
model.factors: DataFrame
ALS model decomposition factors.
iter.info: DataFrame
Information of ALS iterations.
statistics: DataFrame
Statistical information of the ALS model.
optim.param: DataFrame
Optimal parameters selected.
Avaliable only when parameter selection is triggered.
Examples
Input DataFrame data:
> data$Collect()
USER MOVIE RATING
1 A Movie1 4.8
2 A Movie2 4.0
3 A Movie4 4.0
4 A Movie5 4.0
5 A Movie6 4.8
6 A Movie8 3.8
7 A Bad_Movie 2.5
8 B Movie2 4.8
......
35 E Movie6 4.2
36 E Movie7 3.5
37 E Movie8 3.5
Call the function:
als <- hanaml.ALS(data = data,
factors = 2,
lambda = 1e-2,
max.iter = 20,
thread.ratio = 0,
random.state = 1)
Output:
> als$model.map$Collect()
ID MAP
1 0 A
2 1 B
3 2 C
4 3 D
5 4 E
6 5 Movie1
7 6 Movie2
8 7 Movie4
9 8 Movie5
10 9 Movie6
11 10 Movie8
12 11 Bad_Movie
13 12 Movie3
14 13 Movie7
> als$iter.info$Collect()
ITERATION COST RMSE
1 20 0.14724755464106934 0.1086315164152475
See also