predict.KNNClassifier.Rd
Make Predictions from a "KNNClassifier" Object
# S3 method for KNNClassifier
predict(
model,
data,
key,
features = NULL,
stat.info = NULL,
thread.ratio = NULL,
interpret = FALSE,
sample.size = NULL,
top.k.attributions = NULL,
random.state = NULL
)
S3
methods
R6Class object
A "KNNClassifier" object for prediction.
DataFrame
DataFrame containting the data.
character
Name of the ID column.
character of list of characters, optional
Name of feature columns for prediction.
If not provided, it defaults to all non-key columns of data.
logical, optional
Controls whether to return a statistic information table containing
the distance between each point in the prediction set and its
k nearest neighbors in the training set.
If TRUE, the statistics table will be returned non-empty.
Defaults to TRUE.
double, optional
Controls the proportion of available threads that can be used by this
function.
The value range is from 0 to 1, where 0 indicates a single thread,
and 1 indicates all available threads.
Values between 0 and 1 will use up to
that percentage of available threads.Values outside this
range are ignored.
Defaults to 0.
logical, optional
Controls whether or not to interpret the prediction result.
Defaults to FALSE.
integer, optional
Specifies the number of sampled combinations of features.
0 means the number will be determined by algorithm heuristically.
Valid only when interpret
is TRUE.
Defaults to 0.
integer, optional
Specifies the number of features with topmost attributions to output.
Valid only when interpret
is TRUE.
Defaults to 10.
integer, optional
Specifies the seed for random number generating.
0: current time.
others: the actual seed.
Returns a list of DataFrames:DataFrame 1
: Prediction results, structured as follows.
ID column, with same name and type as data's ID column.
TARGET column, type NVARCHAR, predicted class labels.
REASON_CODE column, type NVARCHAR, prediction results' interpretation.
Available only when interpret
is TRUE.
DataFrame 2
: Statistics of the prediction results.
The distance between each point in `data` and its k nearest
neighbors in the training set. Only returned if stat.info is TRUE.
TEST_ + data's ID name, with same type as data's ID column, query data ID.
K, type INTEGER, K number.
TRAIN_ + training data's ID name, with same type as training data's ID column, neighbor point's ID.
DISTANCE, type DOUBLE, distance.
DataFrame df.pred for prediction:
> df.pred
ID X1 X2 X3
1 0 2 1 A
2 1 1 10 C
3 2 1 11 B
4 3 3 15000 C
5 4 2 1000 C
6 5 1 1001 A
7 6 1 999 A
8 7 3 999 B
Call the function using a "KNNClassifier" Object knc:
> res <- predict(model = knc,
data = df.pred,
key = "ID",
features = c("X1", "X2", "X3"),
stat.info = FALSE)
> res$Collect()
ID TARGET
1 0 10
2 1 10
3 2 10
4 3 1
5 4 1
6 5 1
7 6 10
8 7 99