FFMRanker
- class hana_ml.algorithms.pal.recommender.FFMRanker(ordering=None, normalise=None, include_linear=None, early_stop=None, random_state=None, factor_num=None, max_iter=None, train_size=None, learning_rate=None, linear_lamb=None, poly2_lamb=None, tol=None, exit_interval=None, handle_missing=None)
Field-Aware Factorization Machine with the task of ranking using ordinal regression.
- Parameters:
- factor_numint, optional
The factorization dimensionality.
Default to 4.
- random_stateint, optional
Specifies the seed for random number generator.
0: Uses the current time as the seed.
Others: Uses the specified value as the seed.
Default to 0.
- train_sizefloat, optional
The proportion of data used for training, and the remaining dataset for validation.
For example, 0.8 indicates that 80% for training, and the remaining 20% for validation.
Default to 0.8 if number of instances not less than 40, 1.0 otherwise.
- max_iterint, optional
Specifies the maximum number of iterations for the ALS algorithm.
Default to 20.
- orderinga list of str, optional
Specifies the categories orders(in ascending) for ranking.
No default value.
- normalisebool, optional
Specifies whether to normalize each instance so that its L1 norm is 1.
Default to True.
- include_linearbool, optional
Specifies whether to include the the linear part of the model.
Default to True.
- early_stopbool, optional
Specifies whether to early stop the SGD optimization.
Valid only if the value of
train_size
is less than 1.Default to True.
- learning_ratefloat, optional
The learning rate for SGD iteration.
Default to 0.2.
- linear_lambfloat, optional
The L2 regularization parameter for the linear coefficient vector.
Default to 1e-5.
- poly2_lambfloat, optional
The L2 regularization parameter for factorized coefficient matrix of the quadratic term.
Default to 1e-5.
- tolfloat, optional
The criterion to determine the convergence of SGD.
Default to 1e-5.
- exit_intervalint, optional
The interval of two iterations for comparison to determine the convergence.
Default to 5.
- handle_missing{'skip', 'fill_zero'}, optional
Specifies how to handle missing feature:
'skip': remove rows with missing values.
'fill_zero': replace missing values with 0.
Default to 'fill_zero'.
Examples
Input DataFrame df_train_ranker:
>>> df_train_ranker.collect() USER MOVIE TIMESTAMP CTR 0 A Movie1 3.0 medium 1 A Movie2 3.0 too high ... 35 E Movie7 4.0 low 36 E Movie8 3.0 too low
Create a FFMRanker instance:
>>> ffm = FFMRanker(ordering=['too low', 'low', 'medium', 'high', 'too high'], factor_num=4, early_stop=True, learning_rate=0.2, max_iter=20, train_size=0.8, linear_lamb=1e-5, poly2_lamb=1e-6, random_state=1)
Perform fit():
>>> ffm.fit(data=df_train_rank, categorical_variable='TIMESTAMP') >>> ffm.stats_.collect() STAT_NAME STAT_VALUE 0 task ranking ... 6 tr-loss 1.3432013591533276 7 va-loss 1.5509792122994928
Perform predict():
>>> res = ffm.predict(data=df_predict, key='ID', thread_ratio=1)
>>> res.collect() ID SCORE CONFIDENCE 0 1 high 0.294206 1 2 medium 0.209893 ... 8 9 high 0.282633
- Attributes:
- meta_DataFrame
Model metadata content.
- coef_DataFrame
- The DataFrame inclusive of the following information:
Feature name,
Field name,
The factorization number,
The parameter value.
- stats_DataFrame
Statistics.
- cross_valid_DataFrame
Cross validation content.
Methods
fit
(data[, key, features, label, ...])Fit the model to the training dataset.
predict
(data[, key, features, thread_ratio, ...])Prediction for the input data with the trained FFMRanker model.
- fit(data, key=None, features=None, label=None, categorical_variable=None, delimiter=None)
Fit the model to the training dataset.
- Parameters:
- dataDataFrame
Data to be fit.
- keystr, optional
Name of the ID column.
If
key
is not provided, then:if
data
is indexed by a single column, thenkey
defaults to that index column;otherwise, it is assumed that
data
contains no ID column.
- featuresstr or a list of str optional
Name of the feature columns.
- delimiterstr, optional
The delimiter to separate string features.
For example, "China, USA" indicates two feature values "China" and "USA".
Default to ','.
- labelstr, optional
Specifies the dependent variable.
For ranking, the label column must have categorical data type.
Default to last column name.
- categorical_variablestr or a list of str optional
Indicates whether or not a column data is actually corresponding to a category variable even the data type of this column is INTEGER.
By default, 'VARCHAR' or 'NVARCHAR' is category variable, and 'INTEGER' or 'DOUBLE' is continuous variable.
- Returns:
- A fitted object of class "FFMRanker".
- predict(data, key=None, features=None, thread_ratio=None, handle_missing=None)
Prediction for the input data with the trained FFMRanker model.
- Parameters:
- dataDataFrame
Data to be fit.
- keystr, optional
Name of the ID column.
Mandatory if
data
is not indexed, or the index ofdata
contains multiple columns.Defaults to the single index column of
data
if not provided.- featuresstr or a list of str optional
Global side features column name in the training dataframe.
- thread_ratiofloat, optional
Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.
Default to -1.
- handle_missingstr, optional
Specifies how to handle missing feature:
'skip': remove rows with missing values.
'fill_zero': replace missing values with 0.
Default to 'fill_zero'.
- Returns:
- DataFrame
Prediction result, structured as follows:
1st column : ID
2nd column : SCORE, i.e. predicted ranking
3rd column : CONFIDENCE, the confidence for ranking.
Inherited Methods from PALBase
Besides those methods mentioned above, the FFMRanker class also inherits methods from PALBase class, please refer to PAL Base for more details.