hanaml.ALS.Rd
Alternating least squares (ALS) is a powerful matrix factorization algorithm for building both explicit and implicit feedback based recommender systems.
hanaml.ALS(
data = NULL,
key = NULL,
used.cols = NULL,
factors = NULL,
lambda = NULL,
max.iter = NULL,
tol = NULL,
exit.interval = NULL,
implicit = NULL,
linsolver = NULL,
cg.max.iter = NULL,
alpha = NULL,
thread.ratio = NULL,
resampling.method = NULL,
evaluation.metric = NULL,
fold.num = NULL,
repeat.times = NULL,
param.search.strategy = NULL,
random.search.times = NULL,
random.state = NULL,
timeout = NULL,
progress.indicator.id = NULL,
parameter.range = NULL,
parameter.values = NULL,
reduction.rate = NULL,
min.resource.rate = NULL,
aggressive.elimination = NULL
)
DataFrame
Input data for ALS model training. It must contain the following three columns:
user name/ID column.
item name/ID column.
column of user feedback for item.
character, optional
Name of the ID column.
If not provided, the data is assumed to have no ID column.
No default value.
list/vector of character, optional
Specifies the three columns of data that are used for training ALS model.
Should arranged in the order of: user, item and feedback.
Otherwise, the list/vector must be named, shown as follows:
used.cols <- list(user = xxx, item = xxx, feedback = xxx)
Default to the first three non-ID columns if not provided.
integer, optional
Number of factor vectors in the matrix decomposition model of ALS.
Defautls to 8.
double, optional
Amount of penalization appled to the L2 regularization of the decomposed factors.
Defaults to 1e-2.
integer, optional
Maximum number of iterations for the ALS algorithm.
Defaults to 20.
double, optional
Specfies the exit threshold, i.e. if the value of cost function is
decreased less than this value since the last check, then the algorithm exits.
Should be no less than 0, where 0 means not checking the value of cost function and
the algorithm only exits when reaching the maximum number of iterations.
Defaults to 0.
integer, optional
Specifies the interval between consecutive checking of the exit criterion(i.e. tolerance).
Larger number means fewer additional evaluations of the cost function.
Valid only when tol
is nonzero.
Defaults to 5.
logical, optional
Specifies whether to train the ALS model implicitly(TRUE) or explicitly(FALSE).cr
Default to FALSE.
c("cholesky", "cd"), optional
Specifies the solver for solving the corresponding linear systems in ALS model.
Defaults to "cholesky", while "cg" is recommended when factors
is large.
integer, optional
Specifies the maximum number of iterations for solving a linear system using the "cg" solver.
Valid only when linsolver
is "cg".
Defaults to 3.
numeric, optional
Specifies a value when computing the confidence level in implicit ALS.
Valid only when implicit
is TRUE.
Defaults to 1.0.
double, optional
Controls the proportion of available threads that can be used by this
function.
The value range is from 0 to 1, where 0 indicates a single thread,
and 1 indicates all available threads.
Values between 0 and 1 will use up to
that percentage of available threads.Values outside this
range are ignored.
Defaults to 0.
character, optional
specifies the resampling method for model evaluation or parameter selection.
Valid options include:
"cv", "bootstrap", "cv_sha", "bootstrap_sha", "cv_hyperband",
"bootstrap_hyperband".
If no value is specified for this parameter, neither model evaluation
nor parameter selection is activated.
character, optional
Specifies the evaluation metric for model evaluation or parameter selection.
Must be specified together with "resampling.method" to activate model evaluation
or parameter selection.
Currently the only valid option is "rmse".
Defaults to "rmse".
integer, optional
Specifies the fold number for the cross-validation(cv).
Mandatory and valid only when resampling.method
is specified with
prefix "cv"(i.e. "cv", "cv_sha" and "cv_hyperband").
Defaults to 1.
numeric, optional
Specifies the number of repeat times for resampling.
Defaults to 1.
c('grid', 'random'), optional
Specifies the parameter search strategy to activate parameter selection.
Defaults to "random" and cannot be changed if resampling.method
is
either "cv_hyperband" or "bootstrap_hyperband", otherwise no default value.
integer, optional
Specifies the number of times to randomly select candidate parameters for selection.
Mandatory and valid only when param.search.strategy
is "random".
integer, optional
Specifies the seed for random number generator.
0 means using current system time as the seed.
integer, optional
Specifies maximum running time for model evaluation or parameter selection in seconds.
No timeout when 0 is specified.
character, optional
Sets an ID of progress indicator for model evaluation or parameter selection.
No progress indicator is active if no value is provided.
list, optional
Specifies range of the following parameters for parameter selection:factors, lambda, alpha
.
Parameter range should be specified by 3 numbers in the form of c(start, step, end).
Examples:
parameter.range <- list(factors = c(10, 1, 20)).
If param.search.strategy
is 'random', then step has no effect
and thus can be omitted.
list, optional
Specifies values of the following parameters for parameter selection:factors, lambda, alpha
.
numeric, optional
Specifies the reduction rate of available size of hyper-parameter candidates.
For each round, the available parameter candidate size will be divided by value of this parameter.
Thus valid value for this parameter must be greater than 1.0
Valid only when parameter selection is activated and resampling.method
is specified with suffix "sha" or "hyperband".
Defaults to 3.0.
numeric, optional
Specifies the minimum resource rate that should be used in SHA or hyperband iteration.
Valid only when parameter selection is activated and resampling.method
is specified with suffix "sha" or "hyperband".
Defaults to 0.0.
logical, optional
Specifies whether to perform aggressive elimination behavior for successive-halving algorithm or not.
When set to TRUE, it will eliminate more parameter candidates than
expected(defined via reduction.rate
).
This can enhance the run-time performance but could result in sub-optimal hyper-parameter candidate.
Valid only when resampling.method
is specified with suffix "sha".
Defaults to FALSE.
An R6 object of class "ALS" with the following attributes and methods:
Attributes
model.meta: DataFrame
ALS model metadata content.
model.map: DataFrame
ALS model map content.
model.factors: DataFrame
ALS model decomposition factors.
iter.info: DataFrame
Information of ALS iterations.
statistics: DataFrame
Statistical information of the ALS model.
optim.param: DataFrame
Optimal parameters selected.
Avaliable only when parameter selection is triggered.
Methods
CreateModelState(model=NULL, algorithm=NULL, func=NULL, state.description="ModelState", force=FALSE)
Usage:
> als <- hanaml.ALS(data=df)
> als$CreateModelState()
Arguments:
model: DataFrame
DataFrame containing the model for parsing.
Defaults to self$model
.
algorithm: character
Specifies the PAL algorithm associated with model
.
Defaults to self$pal.algorithm
.
func: character
Specifies the functionality for Unified Classification/Regression.
Valid only for object instance of R6Class "UnifiedClassification" or "UnifiedRegression".
Defaults to self$func
.
state.description: character
A summary string for the generated model state.
Defaults to "ModelState".
force: logic
Specifies whether or not the replace existing state for model
.
Defaults to FALSE.
After calling this method, an attribute state
that contains the parsed info for model
shall be assigned
to the corresponding R6 object.
DeleteModelState(state=NULL)
Usage:
Assuming we have trained a hanaml
model and created its model state, like the following:
> als <- hanaml.ALS(data=df)
> als$CreateModelState()
After using the model state for real-time scoring, we can delete the state by calling:
> als$DelateModelState()
Arguments:
state: DataFrame
DataFrame containing the state info.
Defaults to self$state
.
After calling this method, the specified model state shall be cleaned up and associated memory be released.
Input DataFrame data:
> data$Collect()
USER MOVIE RATING
1 A Movie1 4.8
2 A Movie2 4.0
3 A Movie4 4.0
4 A Movie5 4.0
5 A Movie6 4.8
6 A Movie8 3.8
7 A Bad_Movie 2.5
8 B Movie2 4.8
......
35 E Movie6 4.2
36 E Movie7 3.5
37 E Movie8 3.5
Call the function:
als <- hanaml.ALS(data = data,
factors = 2,
lambda = 1e-2,
max.iter = 20,
thread.ratio = 0,
random.state = 1)
Output:
> als$model.map$Collect()
ID MAP
1 0 A
2 1 B
3 2 C
4 3 D
5 4 E
6 5 Movie1
7 6 Movie2
8 7 Movie4
9 8 Movie5
10 9 Movie6
11 10 Movie8
12 11 Bad_Movie
13 12 Movie3
14 13 Movie7
> als$iter.info$Collect()
ITERATION COST RMSE
1 20 0.14724755464106934 0.1086315164152475