FFMClassifier
- class hana_ml.algorithms.pal.recommender.FFMClassifier(ordering=None, normalise=None, include_linear=None, include_constant=None, early_stop=None, random_state=None, factor_num=None, max_iter=None, train_size=None, learning_rate=None, linear_lamb=None, poly2_lamb=None, tol=None, exit_interval=None, handle_missing=None)
Field-Aware Factorization Machine with the task of classification.
- Parameters:
- factor_numint, optional
The factorization dimensionality. Default to 4.
- random_stateint, optional
Specifies the seed for random number generator.
0: Uses the current time as the seed.
Others: Uses the specified value as the seed.
Default to 0.
- train_sizefloat, optional
The proportion of dataset used for training, and the remaining dataset for validation.
For example, 0.8 indicates that 80% for training, and the remaining 20% for validation.
Default to 0.8 if number of instances not less than 40, 1.0 otherwise.
- max_iterint, optional
Specifies the maximum number of iterations for the alternative least square algorithm.
Default to 20
- orderinga list of str, optional(deprecated)
Specifies the categories orders for ranking.
This parameter is meaningless for classification problems and will be removed in future release.
No default value.
- normalisebool, optional
Specifies whether to normalize each instance so that its L1 norm is 1.
Default to True.
- include_constantbool, optional
Specifies whether to include the w0 constant part.
Default to True.
- include_linearbool, optional
Specifies whether to include the linear part of regression model.
Default to True.
- early_stopbool, optional
Specifies whether to early stop the SGD optimization.
Valid only if the value of
thread_ratio
is less than 1.Default to True.
- learning_ratefloat, optional
The learning rate for SGD iteration.
Default to 0.2.
- linear_lambfloat, optional
The L2 regularization parameter for the linear coefficient vector.
Default to 1e-5.
- poly2_lambfloat, optional
The L2 regularization parameter for factorized coefficient matrix of the quadratic term.
Default to 1e-5.
- tolfloat, optional
The criterion to determine the convergence of SGD.
Default to 1e-5.
- exit_intervalint, optional
The interval of two iterations for comparison to determine the convergence.
Default to 5.
- handle_missingstr, optional
Specifies how to handle missing feature:
'skip': skip rows with missing values.
'fill_zero': replace missing values with 0.
Default to 'fill_zero'.
Examples
Input DataFrame:
>>> df_train_classification.collect() USER MOVIE TIMESTAMP CTR 0 A Movie1 3.0 Click 1 A Movie2 3.0 Click ... 35 E Movie7 4.0 Not click 36 E Movie8 3.0 Not click
Create a FFMClassifier instance:
>>> ffm = FFMClassifier(linear_lamb=1e-5, poly2_lamb=1e-6, random_state=1, factor_num=4, early_stop=1, learning_rate=0.2, max_iter=20, train_size=0.8)
Perform fit():
>>> ffm.fit(data=df_train_classification, categorical_variable='TIMESTAMP') >>> ffm.stats_.collect() STAT_NAME STAT_VALUE 0 task classification 1 feature_num 18 2 field_num 3 3 k_num 4 4 category Click, Not click 5 iter 3 6 tr-loss 0.6409316561278655 7 va-loss 0.7452354780967997
Perform predict():
>>> res = ffm.predict(data=df_predict, key='ID', thread_ratio=1) >>> res.collect() ID SCORE CONFIDENCE 0 1 Not click 0.543537 1 2 Not click 0.545470 ... 7 8 Not click 0.536781 8 9 Not click 0.635412
- Attributes:
- meta_DataFrame
Model metadata content.
- coef_DataFrame
- DataFrame that provides the following information:
Feature name,
Field name,
The factorization number,
The parameter value.
- stats_DataFrame
Statistics.
- cross_valid_DataFrame
Cross validation content.
Methods
fit
(data[, key, features, label, ...])Fit the model to the training dataset.
Get the model metrics.
Get the score metrics.
predict
(data[, key, features, thread_ratio, ...])Prediction for the input data with the trained FFMClassifier model.
- fit(data, key=None, features=None, label=None, categorical_variable=None, delimiter=None)
Fit the model to the training dataset.
- Parameters:
- dataDataFrame
Data to be fit.
- keystr, optional
Name of the ID column.
If
key
is not provided, then:if
data
is indexed by a single column, thenkey
defaults to that index column;otherwise, it is assumed that
data
contains no ID column.
- featuresstr or a list of str optional
Name of the feature columns.
- delimiterstr, optional
The delimiter to separate string features.
For example, "China, USA" indicates two feature values "China" and "USA".
Default to ','.
- labelstr, optional
Specifies the dependent variable.
For classification, the label column can be any kind of data type.
Default to last column name.
- categorical_variablestr or a list of str optional
Indicates whether or not a column data is actually corresponding to a category variable even the data type of this column is INTEGER.
By default, 'VARCHAR' or 'NVARCHAR' is category variable, and 'INTEGER' or 'DOUBLE' is continuous variable.
- Returns:
- A fitted object of class "FFMClassifier".
- predict(data, key=None, features=None, thread_ratio=None, handle_missing=None)
Prediction for the input data with the trained FFMClassifier model.
- Parameters:
- dataDataFrame
Data to be fit.
- keystr, optional
Name of the ID column.
Mandatory if
data
is not indexed, or the index ofdata
contains multiple columns.Defaults to the single index column of
data
if not provided.- featuresstr or a list of str optional
Global side features column name in the training dataframe.
- thread_ratiofloat, optional
Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.
Default to -1.
- handle_missingstr, optional
Specifies how to handle missing feature:
'skip': skip rows with missing values.
'fill_zero': replace missing values with 0.
Default to 'fill_zero'.
- Returns:
- DataFrame
Prediction result, structured as follows:
1st column : ID
2nd column : SCORE, i.e. predicted class labels
3rd column : CONFIDENCE, the confidence for assigning class labels.
- get_model_metrics()
Get the model metrics.
- Returns:
- DataFrame
The model metrics.
- get_score_metrics()
Get the score metrics.
- Returns:
- DataFrame
The score metrics.
Inherited Methods from PALBase
Besides those methods mentioned above, the FFMClassifier class also inherits methods from PALBase class, please refer to PAL Base for more details.