Similar to other predict methods, this function
predicts fitted values from a fitted "UnifiedClassification" object.
# S3 method for UnifiedClassification
predict(
model,
data,
key,
features = NULL,
thread.ratio = NULL,
verbose = NULL,
func = NULL,
multi.class = NULL,
alpha = NULL,
block.size = NULL,
missing.replacement = NULL,
class.map0 = NULL,
class.map1 = NULL,
categorical.variable = NULL,
attribution.method = NULL,
top.k.attributions = NULL,
sample.size = NULL,
random.state = NULL
)
Arguments
| model |
R6Class
A "UnifiedClassification" object for prediction.
|
| data |
DataFrame
DataFrame containting the data.
|
| key |
character
Name of the ID column.
|
| features |
character of list of characters, optional
Name of feature columns for prediction.
If not provided, it defaults to all non-key columns of data.
|
| thread.ratio |
double, optional
Controls the proportion of available threads that can be used by this
function.
The value range is from 0 to 1, where 0 indicates a single thread,
and 1 indicates all available threads. Values between 0 and 1 will use up to
that percentage of available threads.
Values outside the range from 0 to 1 are ignored, and the actual number of threads
used is then be heuristically determined.
Defaults to -1.
|
| verbose |
logical, optional
If TRUE, output all classes and the corresponding
confidences for each data point.
Defaults to FALSE.
|
| func |
character, optional
The functionality for unified classification model.
Mandatory only when the func attribute of model is NULL.
Valid values are as follows:
"DecisionTree", "RandomDecisionTrees", "HGBT",
"LogisticRegression", "NaiveBayes", "SVM", "MLP".
Defaults to model$func.
|
| multi.class |
logical, optional
If the functionality of the unified classification model is LogisticRegression,
then this parameter indicates whether or not the classification mdoel is
binary-class case or multiple-class case.
Valid only when func is set to be "LogisticRegression".
|
| alpha |
double, optional
Specifies the value of Laplace smoothing.
A positive value will enable Laplace smoothing for categorical variables
with that value being the smoothing parameter.
Set the value to 0 to disable Laplace smoothing .
Defaults to the alpha value in the JSON model if there is one, and 0 otherwise.
|
| block.size |
integer, optional
Specifies the number of data loaded per time during scoring.
Valid only when func is "RandomDecisionTrees"(case insensitive).
Defaults to 0. |
| missing.replacement |
character, optional
Specifies the strategy for replacement of missing values in prediction data.
Valid only when func is "RandomDecisionTrees" or "HGBT".
Defaults to 'feature.marginalized'. |
| class.map0 |
character, optional
Specifies the label value which will be mapped to 0 in logistic regression.
Mandatory and valid only for logistic regression models when the label variable is of type VARCHAR or NVARCHAR.
Defaults to the value of class.map0 in the model training phase.
|
| class.map1 |
character, optional
Specifies the label value which will be mapped to 1 in logistic regression.
Mandatory and valid only for logistic regression models when the label variable is of type VARCHAR or NVARCHAR.
Defaults to the value of class.map1 in the model training phase.
|
| categorical.variable |
character or list of characters, optional
Indicates features that should be treated as categorical variable.
The behavior is dependent on what input is given:
VALID only for variables of type "INTEGER",omitted otherwise.
Default to the value of categorical.variable in the model training phase. |
| attribution.method |
character, optional
Specifies which method to use in model reasoning:
Valid only for tree-based classification models.
Defaults to "shap". |
| top.k.attributions |
character, optional
Output the attributions of top k features which contribute the most.
Defaults to 10.
|
| sample.size |
integer, optional
Specifies the number of sampled combinations of features.
If set to 0, the value is determined by algorithm heuristically.
Valid only when the trained classification model is for Naive Bayes, Support Vector Machine(SVM),
Multilayer Perceptron or Multi-class Logistic Regression.
Defaults to 0.
|
| random.state |
integer, optional
Specifies the seed for random number generator.
Valid only when the trained classification model is for Naive Bayes, Support Vector Machine(SVM),
Multilayer Perceptron(MLP) or Multi-class Logistic Regression.
Defaults to 0. |
S3 methods
Value
Predicted values are returned as a DataFrame, structured as follows.
ID column name
SCORE
CONFIDENCE
REASON CODE
Examples
Input data for prediction:
> df.predict
ID OUTLOOK TEMP HUMIDITY WINDY
1 0 Overcast 75 -10000 Yes
2 1 Rain 78 70 Yes
3 2 Sunny -10000 NA Yes
4 3 Sunny 69 70 Yes
5 4 Rain NA 70 Yes
6 5 <NA> 70 70 Yes
7 6 *** 70 70 Yes
Call the predict() function:
> res <- predict(model = uc.dt,
data = df.predict,
key = "ID",
func = "DecisionTree")
Check the result:
> res$Collect()[1:3]
ID SCORE CONFIDENCE
1 0 Play 1.0000000
2 1 Do not Play 1.0000000
3 2 Play 0.5000000
4 3 Play 0.5000000
5 4 Play 0.6363636
6 5 Play 0.5000000
7 6 Play 0.5000000
See also