FFMRanker

class hana_ml.algorithms.pal.recommender.FFMRanker(ordering=None, normalise=None, include_linear=None, early_stop=None, random_state=None, factor_num=None, max_iter=None, train_size=None, learning_rate=None, linear_lamb=None, poly2_lamb=None, tol=None, exit_interval=None, handle_missing=None)

Field-Aware Factorization Machine with the task of ranking.

Parameters
factor_numint, optional

The factorization dimensionality.

Default to 4.

random_stateint, optional

Specifies the seed for random number generator.

  • 0: Uses the current time as the seed.

  • Others: Uses the specified value as the seed.

Default to 0.

train_sizefloat, optional

The proportion of data used for training, and the remaining data set for validation.

For example, 0.8 indicates that 80% for training, and the remaining 20% for validation.

Default to 0.8 if number of instances not less than 40, 1.0 otherwise.

max_iterint, optional

Specifies the maximum number of iterations for the ALS algorithm.

Default to 20.

orderingListOfStrings, optional

Specifies the categories orders(in ascending) for ranking.

No default value.

normalisebool, optional

Specifies whether to normalize each instance so that its L1 norm is 1.

Default to True.

include_linearbool, optional

Specifies whether to include the the linear part of the model.

Default to True.

early_stopbool, optional

Specifies whether to early stop the SGD optimization.

Valid only if the value of train_size is less than 1.

Default to True.

learning_ratefloat, optional

The learning rate for SGD iteration.

Default to 0.2.

linear_lambfloat, optional

The L2 regularization parameter for the linear coefficient vector.

Default to 1e-5.

poly2_lambfloat, optional

The L2 regularization parameter for factorized coefficient matrix of the quadratic term.

Default to 1e-5.

tolfloat, optional

The criterion to determine the convergence of SGD.

Default to 1e-5.

exit_intervalint, optional

The interval of two iterations for comparison to determine the convergence.

Default to 5.

handle_missing{'skip', 'fill_zero'}, optional

Specifies how to handle missing feature:

  • 'skip': remove rows with missing values.

  • 'fill_zero': replace missing values with 0.

Default to 'fill_zero'.

Examples

Input dataframe for regression training:

>>> df_train_ranker.collect()
   USER                   MOVIE  TIMESTAMP       CTR
0     A                  Movie1        3.0    medium
1     A                  Movie2        3.0  too high
2     A                  Movie4        1.0    medium
3     A                  Movie5        2.0   too low
4     A                  Movie6        3.0       low
5     A                  Movie8        2.0       low
6     A          Movie0, Movie3        1.0  too high
7     B                  Movie2        3.0      high
8     B                  Movie3        2.0      high
9     B                  Movie4        2.0    medium
10    B                    None        4.0    medium
11    B                  Movie7        1.0      high
12    B                  Movie8        2.0      high
13    B                  Movie0        3.0      high
14    C                  Movie1        2.0    medium
15    C  Movie2, Movie5, Movie7        4.0       low
16    C                  Movie4        3.0   too low
17    C                  Movie5        1.0      high
18    C                  Movie6        NaN  too high
19    C                  Movie7        3.0      high
20    C                  Movie8        1.0  too high
21    C                  Movie0        2.0    medium
22    D                  Movie1        3.0  too high
23    D                  Movie3        2.0  too high
24    D          Movie4, Movie7        2.0  too high
25    D                  Movie6        2.0  too high
26    D                  Movie7        4.0  too high
27    D                  Movie8        3.0   too low
28    D                  Movie0        3.0   too low
29    E                  Movie1        2.0   too low
30    E                  Movie2        2.0  too high
31    E                  Movie3        2.0    medium
32    E                  Movie4        4.0       low
33    E                  Movie5        3.0  too high
34    E                  Movie6        2.0       low
35    E                  Movie7        4.0       low
36    E                  Movie8        3.0   too low

Creating FFMRanker instance:

>>> ffm = FFMRanker(ordering=['too low', 'low', 'medium', 'high', 'too high'],
                     factor_num=4, early_stop=True, learning_rate=0.2, max_iter=20, train_size=0.8,
                     linear_lamb=1e-5, poly2_lamb=1e-6, random_state=1)

Performing fit() on given dataframe:

>>> ffm.fit(data=df_train_rank, categorical_variable='TIMESTAMP')
>>> ffm.stats_.collect()
     STAT_NAME                            STAT_VALUE
0         task                               ranking
1  feature_num                                    18
2    field_num                                     3
3        k_num                                     4
4     category  too low, low, medium, high, too high
5         iter                                    14
6      tr-loss                    1.3432013591533276
7      va-loss                    1.5509792122994928

Performing predict() on given predicting dataframe:

>>> res = ffm.predict(data=df_predict, key='ID', thread_ratio=1)
>>> res.collect()
   ID     SCORE  CONFIDENCE
0   1      high    0.294206
1   2    medium    0.209893
2   3   too low    0.316609
3   4      high    0.219671
4   5  too high    0.222545
5   6      high    0.385621
6   7   too low    0.407695
7   8   too low    0.295200
8   9      high    0.282633
Attributes
meta_DataFrame

Model metadata content.

coef_DataFrame
The DataFrame inclusive of the following information:
  • Feature name,

  • Field name,

  • The factorization number,

  • The parameter value.

stats_DataFrame

Statistic values.

cross_valid_DataFrame

Cross validation content.

Methods

fit(data[, key, features, label, ...])

Fit the FFMRanker model with the input training data.

predict(data[, key, features, thread_ratio, ...])

Prediction for the input data with the trained FFMRanker model.

property fit_hdbprocedure

Returns the generated hdbprocedure for fit.

property predict_hdbprocedure

Returns the generated hdbprocedure for predict.

fit(data, key=None, features=None, label=None, categorical_variable=None, delimiter=None)

Fit the FFMRanker model with the input training data. Model parameters should be given by initializing the model first.

Parameters
dataDataFrame

Data to be fit.

keystr, optional

Name of the ID column.

If key is not provided, then:

  • if data is indexed by a single column, then key defaults to that index column;

  • otherwise, it is assumed that data contains no ID column.

featuresstr/ListOfStrings, optional

Name of the feature columns.

delimiterstr, optional

The delimiter to separate string features.

For example, "China, USA" indicates two feature values "China" and "USA".

Default to ','.

labelstr, optional

Specifies the dependent variable.

For ranking, the label column must have categorical data type.

Default to last column name.

categorical_variablestr/ListofStrings, optional

Indicates whether or not a column data is actually corresponding to a category variable even the data type of this column is INTEGER.

By default, 'VARCHAR' or 'NVARCHAR' is category variable, and 'INTEGER' or 'DOUBLE' is continuous variable.

Returns
Fitted object.
predict(data, key=None, features=None, thread_ratio=None, handle_missing=None)

Prediction for the input data with the trained FFMRanker model.

Parameters
dataDataFrame

Data to be fit.

keystr, optional

Name of the ID column.

Mandatory if data is not indexed, or the index of data contains multiple columns.

Defaults to the single index column of data if not provided.

featuresstr/ListOfStrings, optional

Global side features column name in the training dataframe.

thread_ratiofloat, optional

The ratio of available threads.

  • 0: single thread

  • 0~1: percentage

  • Others: heuristically determined

Default to -1.

handle_missingstr, optional

Specifies how to handle missing feature:

  • 'skip': remove rows with missing values.

  • 'fill_zero': replace missing values with 0.

Default to 'fill_zero'.

Returns
DataFrame

Prediction result, structured as follows:

  • 1st column : ID

  • 2nd column : SCORE, i.e. predicted ranking

  • 3rd column : CONFIDENCE, the confidence for ranking.

Inherited Methods from PALBase

Besides those methods mentioned above, the FFMRanker class also inherits methods from PALBase class, please refer to PAL Base for more details.