FRM
- class hana_ml.algorithms.pal.recommender.FRM(solver=None, factor_num=None, init=None, random_state=None, learning_rate=None, linear_lamb=None, lamb=None, max_iter=None, sgd_tol=None, sgd_exit_interval=None, thread_ratio=None, momentum=None, resampling_method=None, evaluation_metric=None, fold_num=None, repeat_times=None, search_strategy=None, random_search_times=None, timeout=None, progress_indicator_id=None, param_values=None, param_range=None, reduction_rate=None, min_resource_rate=None, aggressive_elimination=None)
Factorized Polynomial Regression Models or Factorization Machines approach.
Factorization approach, for example matrix factorization, provides high accuracy in several important prediction problems, including recommender systems. In its basic form, matrix factorization characterizes both users and items by vectors of latent factors inferred from user-item rating patterns and high correspondence between user and item factors will lead to a recommendation. Factorization machines approach for recommender system is more general than common factorization approaches as it can characterize latent factors not only for users and items themselves, but also for side features related to them by making use of additional information, and therefore, makes the predictions more accurate.
- Parameters:
- solver{'sgd', 'momentum', 'nag', 'adagrad'}, optional
Specifies the method for solving the objective minimization problem.
Default to 'sgd'.
- factor_numint, optional
Length of factor vectors in the model.
Default to 8.
- initfloat, optional
Variance of the normal distribution used to initialize the model parameters.
Default to 1e-2.
- random_stateint, optional
Specifies the seed for random number generator.
0: Uses the current time as the seed.
Others: Uses the specified value as the seed.
Note that due to the inherently randomicity of parallel sgc, models of different trainings might be different even with the same seed of random number generator.
Default to 0.
- lambfloat, optional
L2 regularization of the factors.
Default to 1e-8.
- linear_lambfloat, optional
L2 regularization of the factors.
Default to 1e-10.
- thread_ratiofloat, optional
Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.
Defaults to 0.
- max_iterint, optional
Specifies the maximum number of iterations for the ALS algorithm.
Default value is 50.
- sgd_tolfloat, optional
Exit threshold.
The algorithm exits when the cost function has not decreased more than this threshold in
sgd_exit_interval
steps.Default to 1e-5
- sgd_exit_intervalint, optional
The algorithm exits when the cost function has not decreased more than
sgd_tol
insgd_exit_interval
steps.Default to 5.
- momentumfloat, optional
The momentum factor in method 'momentum' or 'nag'.
Valid only when method is 'momentum' or 'nag'.
Default to 0.9.
- resampling_method{'cv', 'bootstrap'}, optional
Specifies the resampling method for model evaluation or parameter selection.
If not specified, neither model evaluation nor parameter selection is activated.
No default value.
- evaluation_metric{'rmse'}, optional
Specifies the evaluation metric for model evaluation or parameter selection.
If not specified, neither model evaluation nor parameter selection is activated.
No default value.
- fold_numint, optional
Specifies the fold number for the cross validation method.
Mandatory and valid only when
resampling_method
is set to 'cv'.Default to 1.
- repeat_timesint, optional
Specifies the number of repeat times for resampling.
Default to 1.
- search_strategy{'grid', 'random'}, optional
Specifies the method to activate parameter selection.
No default value.
- random_search_timesint, optional
Specifies the number of times to randomly select candidate parameters for selection.
Mandatory and valid when PARAM_SEARCH_STRATEGY is set to random.
No default value.
- timeoutint, optional
Specifies maximum running time for model evaluation or parameter selection, in seconds.
No timeout when 0 is specified.
Default to 0.
- progress_indicator_idstr, optional
Sets an ID of progress indicator for model evaluation or parameter selection.
No progress indicator is active if no value is provided.
No default value.
- param_valuesdict or ListOfTuples, optional
Specifies values of parameters to be selected.
Input should be a dict or list of tuple of two elements, with the key/1st element being the parameter name, and value/2nd element being a list of values for selection.
Valid only when
resampling_method
andsearch_strategy
are both specified.Valid parameter names include : 'factor_num', 'lamb', 'linear_lamb', 'momentum'.
No default value.
- param_rangedict or ListOfTuples, optional
Specifies ranges of param to be selected.
Input should be a dict or list of tuple of two elements , with key/1st element being the parameter name, and value/2nd element being a list of numerical values indicating the range for selection.
Valid only when
resampling_method
andsearch_strategy
are both specified.Valid parameter names include:'factor_num', 'lamb', 'linear_lamb', 'momentum'.
No default value.
- reduction_ratefloat, optional
Specifies reduction rate in SHA or Hyperband method.
For each round, the available parameter candidate size will be divided by value of this parameter. Thus valid value for this parameter must be greater than 1.0
Valid only when
resampling_method
takes one of the following values: 'cv_sha', 'bootstrap_sha', 'cv_hyperband', 'bootstrap_hyperband'.Defaults to 3.0.
- min_resource_ratefloat, optional
Specifies the minimum resource rate that should be used in SHA or Hyperband iteration.
Valid only when
resampling_method
takes one of the following values: 'cv_sha', 'cv_hyperband', 'bootstrap_sha', 'bootstrap_hyperband'.Defaults to 0.0.
- aggressive_eliminationbool, optional
Specifies whether to apply aggressive elimination while using SHA method.
Aggressive elimination happens when the data size and parameters size to be searched does not match and there are still bunch of parameters to be searched while data size reaches its upper limits. If aggressive elimination is applied, lower bound of limit of data size will be used multiple times first to reduce number of parameters.
Valid only when
resampling_method
is 'cv_sha' or 'bootstrap_sha'.Defaults to False.
Examples
Input DataFrame:
>>> df_train.collect() USER MOVIE FEEDBACK 0 A Movie1 4.8 1 A Movie2 4.0 ... 35 E Movie7 3.5 36 E Movie8 3.5
Input user dataframe for training:
>>> usr_info.collect() USER USER_SIDE_FEATURE -- There is no side information for user provided. --
Input item dataframe for training:
>>> item_info.collect() 0 MOVIE GENRES 1 Movie1 Sci-Fi ... 8 Movie8 Sci-Fi,Thriller 9 Bad_Movie Romance,Thriller
Create a FRM instance:
>>> frm = FRM(factor_num=2, solver='adagrad', learning_rate=0, max_iter=100, thread_ratio=0.5, random_state=1)
Perform fit():
>>> frm.fit(data=df_train, usr_info, item_info, categorical_variable='TIMESTAMP')
>>> frm.factors_.collect().head(10) FACTOR_ID FACTOR 0 0 -0.083550 1 1 -0.083654 ... 8 8 -0.056534 9 9 -0.342042
Perform predict():
>>> res = frm.predict(data=df_predict, usr_info=usr_info, item_info=item_info, thread_ratio=0.5, key='ID') >>> res.collect() ID USER ITEM PREDICTION 0 1 A None 3.486804 1 2 A 4 3.490246 ... 6 7 D 3 4.097683 7 8 E 2 2.317224
- Attributes:
- metadata_DataFrame
Model metadata content.
- model_DataFrame
Model (Map, Weight)
- factors_DataFrame
Decomposed factors.
- optim_param_DataFrame
Optimal parameters selected.
- stats_DataFrame
Statistics.
- iter_info_DataFrame.
Cost function value and RMSE of corresponding iteration.
Methods
create_model_state
([model, function, ...])Create PAL model state.
delete_model_state
([state])Delete PAL model state.
fit
(data, usr_info, item_info[, key, usr, ...])Fit the model to the training dataset.
predict
(data, usr_info, item_info[, key, ...])Prediction for the input data with the trained FRM model.
set_model_state
(state)Set the model state by state information.
- fit(data, usr_info, item_info, key=None, usr=None, item=None, feedback=None, features=None, usr_features=None, item_features=None, usr_key=None, item_key=None, categorical_variable=None, usr_categorical_variable=None, item_categorical_variable=None)
Fit the model to the training dataset.
- Parameters:
- dataDataFrame
Data to be fit.
- usr_infoDataFrame
DataFrame containing user side features.
- item_infoDataFrame
DataFrame containing item side features.
- keystr, optional
Name of the ID column.
If
key
is not provided, then:if
data
is indexed by a single column, thenkey
defaults to that index column;otherwise, it is assumed that
data
contains no ID column.
- usrstr, optional
Name of the user column.
Defaults to the first non-key column of
data
.- itemstr, optional
Name of the item column.
Defaults to the first non-key and non-usr column of the input data.
- feedbackstr, optional
Name of the feedback column.
Defaults to the last column of the input data.
- featuresstr or a list of str, optional
Global side features column name in the training dataframe.
Defaults to the rest of input data removing key, usr, item and feedback columns.
- usr_featuresstr or a list of str, optional
User side features column name in the training dataframe.
Defaults to all columns in
usr_info
exclusive of the one specified byusr_key
.- item_featuresstr or a list of str, optional
Item side features column name in the training dataframe.
Defaults to all columns in
item_info
exclusive of the one specified byitem_key
.- user_keystr, optional
Specifies the column in
usr_info
that contains user names or IDs.Defaults to the 1st column of
usr_info
.- item_keystr, optional
Specifies the column in
item_info
that contains item names or IDs.Defaults to the 1st column of
item_info
- categorical_variablestr or a list of str, optional
Specifies which INTEGER columns should be treated as categorical, with all other INTEGER columns treated as continuous.
No default value.
- usr_categorical_variablestr or a list of str, optional
Name of user side feature columns of INTEGER type that should be treated as categorical.
- item_categorical_variablestr or a list of str, optional
Name of item side feature columns of INTEGER type that should be treated as categorical.
- Returns:
- A fitted object of class "FRM".
- predict(data, usr_info, item_info, key=None, usr=None, item=None, features=None, thread_ratio=None)
Prediction for the input data with the trained FRM model.
- Parameters:
- dataDataFrame
Data to be fit.
- usr_infoDataFrame
User side features.
- item_infoDataFrame
Item side features.
- keystr, optional
Name of the ID column.
Mandatory if
data
is not indexed, or the index ofdata
contains multiple columns.Defaults to the single index column of
data
if not provided.- usrlist of str, optional
Name of the column containing user name or user ID. If not provided, it defaults to 1st non-ID column of
data
.- itemstr, optional
Name of the column containing item name or item ID.
If not provided, it defaults to the 1st non-ID, non-usr column of
data
.- featuresstr or a list of str, optional
Global side features column name in the training dataframe.
Defaults to all non key, usr and item columns of
data
.- thread_ratiofloat, optional
Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.
Default to 0.
- Returns:
- DataFrame
Prediction result of FRM algorithm, structured as follows:
1st column : Data ID
2nd column : User name/ID
3rd column : Item name/Id
4th column : Predicted rating
- create_model_state(model=None, function=None, pal_funcname='PAL_FRM', state_description=None, force=False)
Create PAL model state.
- Parameters:
- modelDataFrame, optional
Specify the model for AFL state.
Defaults to self.model_.
- functionstr, optional
Specify the function in the unified API.
A placeholder parameter, not effective for FRM.
- pal_funcnameint or str, optional
PAL function name.
Defaults to 'PAL_FRM'.
- state_descriptionstr, optional
Description of the state as model container.
Defaults to None.
- forcebool, optional
If True it will delete the existing state.
Defaults to False.
- set_model_state(state)
Set the model state by state information.
- Parameters:
- state: DataFrame or dict
If state is DataFrame, it has the following structure:
NAME: VARCHAR(100), it mush have STATE_ID, HINT, HOST and PORT.
VALUE: VARCHAR(1000), the values according to NAME.
If state is dict, the key must have STATE_ID, HINT, HOST and PORT.
- delete_model_state(state=None)
Delete PAL model state.
- Parameters:
- stateDataFrame, optional
Specified the state.
Defaults to self.state.
Inherited Methods from PALBase
Besides those methods mentioned above, the FRM class also inherits methods from PALBase class, please refer to PAL Base for more details.