Similar to other predict methods, this function predicts fitted values from a fitted "UnifiedClassification" object.

# S3 method for UnifiedClassification
predict(
  model,
  data,
  key,
  features = NULL,
  thread.ratio = NULL,
  verbose = NULL,
  func = NULL,
  multi.class = NULL,
  alpha = NULL,
  block.size = NULL,
  missing.replacement = NULL,
  class.map0 = NULL,
  class.map1 = NULL,
  categorical.variable = NULL,
  attribution.method = NULL,
  top.k.attributions = NULL,
  sample.size = NULL,
  random.state = NULL
)

Arguments

model

R6Class
A "UnifiedClassification" object for prediction.

data

DataFrame
DataFrame containting the data.

key

character
Name of the ID column.

features

character of list of characters, optional
Name of feature columns for prediction.
If not provided, it defaults to all non-key columns of data.

thread.ratio

double, optional
Controls the proportion of available threads that can be used by this function.
The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates all available threads. Values between 0 and 1 will use up to that percentage of available threads.
Values outside the range from 0 to 1 are ignored, and the actual number of threads used is then be heuristically determined.
Defaults to -1.

verbose

logical, optional
If TRUE, output all classes and the corresponding confidences for each data point.
Defaults to FALSE.

func

character, optional
The functionality for unified classification model.
Mandatory only when the func attribute of model is NULL.
Valid values are as follows:
"DecisionTree", "RandomDecisionTrees", "HGBT", "LogisticRegression", "NaiveBayes", "SVM", "MLP".
Defaults to model$func.

multi.class

logical, optional
If the functionality of the unified classification model is LogisticRegression,
then this parameter indicates whether or not the classification mdoel is binary-class case or multiple-class case.
Valid only when func is set to be "LogisticRegression".

alpha

double, optional
Specifies the value of Laplace smoothing.
A positive value will enable Laplace smoothing for categorical variables with that value being the smoothing parameter.
Set the value to 0 to disable Laplace smoothing .
Defaults to the alpha value in the JSON model if there is one, and 0 otherwise.

block.size

integer, optional
Specifies the number of data loaded per time during scoring.

  • 0: load all data once

  • Other positive Values: the specified number

Valid only when func is "RandomDecisionTrees"(case insensitive).
Defaults to 0.

missing.replacement

character, optional
Specifies the strategy for replacement of missing values in prediction data.

  • 'feature.marginalized': marginalizes each missing feature out independently

  • 'instance.marginalized': marginalizes all missing features in an instance as a whole corresponding to each category

Valid only when func is "RandomDecisionTrees" or "HGBT".
Defaults to 'feature.marginalized'.

class.map0

character, optional
Specifies the label value which will be mapped to 0 in logistic regression.
Mandatory and valid only for logistic regression models when the label variable is of type VARCHAR or NVARCHAR.
Defaults to the value of class.map0 in the model training phase.

class.map1

character, optional
Specifies the label value which will be mapped to 1 in logistic regression.
Mandatory and valid only for logistic regression models when the label variable is of type VARCHAR or NVARCHAR.
Defaults to the value of class.map1 in the model training phase.

categorical.variable

character or list of characters, optional
Indicates features that should be treated as categorical variable.
The behavior is dependent on what input is given:

  • "VARCHAR" and "NVARCHAR": categorical.

  • "INTEGER" and "DOUBLE": continuous.

VALID only for variables of type "INTEGER",omitted otherwise.
Default to the value of categorical.variable in the model training phase.

attribution.method

character, optional
Specifies which method to use in model reasoning:

  • "no": no reasoning

  • "saabas": SAABAS reasoning

  • "shap": SHAP reasoning

Valid only for tree-based classification models.
Defaults to "shap".

top.k.attributions

character, optional
Output the attributions of top k features which contribute the most. Defaults to 10.

sample.size

integer, optional
Specifies the number of sampled combinations of features.
If set to 0, the value is determined by algorithm heuristically.
Valid only when the trained classification model is for Naive Bayes, Support Vector Machine(SVM), Multilayer Perceptron or Multi-class Logistic Regression.
Defaults to 0.

random.state

integer, optional
Specifies the seed for random number generator.

  • 0: Uses the current time (in second) as seed;

  • Others: Uses the specified value as seed.

Valid only when the trained classification model is for Naive Bayes, Support Vector Machine(SVM), Multilayer Perceptron(MLP) or Multi-class Logistic Regression.
Defaults to 0.

Format

S3 methods

Value

Predicted values are returned as a DataFrame, structured as follows.

  • ID column name

  • SCORE

  • CONFIDENCE

  • REASON CODE

Examples

Input data for prediction:

> df.predict
  ID  OUTLOOK   TEMP HUMIDITY WINDY
1  0 Overcast     75   -10000   Yes
2  1     Rain     78       70   Yes
3  2    Sunny -10000       NA   Yes
4  3    Sunny     69       70   Yes
5  4     Rain     NA       70   Yes
6  5     <NA>     70       70   Yes
7  6      ***     70       70   Yes

Call the predict() function:

> res <- predict(model = uc.dt,
                 data = df.predict,
                 key = "ID",
                 func = "DecisionTree")

Check the result:

> res$Collect()[1:3]
  ID       SCORE CONFIDENCE
1  0        Play  1.0000000
2  1 Do not Play  1.0000000
3  2        Play  0.5000000
4  3        Play  0.5000000
5  4        Play  0.6363636
6  5        Play  0.5000000
7  6        Play  0.5000000

See also