Make Predictions from a "KNNRegressor" Object

# S3 method for KNNRegressor
predict(
  model,
  data,
  key,
  features = NULL,
  stat.info = NULL,
  thread.ratio = NULL,
  interpret = FALSE,
  sample.size = NULL,
  top.k.attributions = NULL,
  random.state = NULL
)

Format

S3 methods

Arguments

model

R6Class object
A "KNNRegressor" object for prediction.

data

DataFrame
DataFrame containting the data.

key

character
Name of the ID column.

features

character of list of characters, optional
Name of feature columns for prediction.
If not provided, it defaults to all non-key columns of data.

stat.info

logical, optional
Controls whether to return a statistic information table containing the distance between each point in the prediction set and its k nearest neighbors in the training set.
If TRUE, the statistics table will be returned non-empty.
Defaults to TRUE.

thread.ratio

double, optional
Controls the proportion of available threads that can be used by this function.
The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates all available threads.
Values between 0 and 1 will use up to that percentage of available threads.Values outside this range are ignored.
Defaults to 0.

interpret

logical, optional
Controls whether or not to interpret the prediction result.
Defaults to FALSE.

sample.size

integer, optional
Specifies the number of sampled combinations of features.
0 means the number will be determined by algorithm heuristically.
Valid only when interpret is TRUE.
Defaults to 0.

top.k.attributions

integer, optional
Specifies the number of features with topmost attributions to output.
Valid only when interpret is TRUE.
Defaults to 10.

random.state

integer, optional
Specifies the seed for random number generating.

  • 0: current time.

  • others: the actual seed.

Value

Returns a list of DataFrames:
DataFrame 1: Prediction results, structured as follows.

  • ID column, with same name and type as data's ID column.

  • TARGET column, type NVARCHAR, predicted values.

  • REASON_CODE column, type NVARCHAR, prediction results' interpretation. Available only when interpret is TRUE.

DataFrame 2: Statistics of the prediction results.
The distance between each point in `data` and its k nearest neighbors in the training set. Only returned if stat.info is TRUE.

  • TEST_ + data's ID name, with same type as data's ID column, query data ID.

  • K, type INTEGER, K number.

  • TRAIN_ + training data's ID name, with same type as training data's ID column, neighbor point's ID.

  • DISTANCE, type DOUBLE, distance.

Examples

DataFrame df.pred for prediction:


> df.pred
   ID X1    X2 X3
 1  0  2     1  A
 2  1  1    10  C
 3  2  1    11  B
 4  3  3 15000  C
 5  4  2  1000  C
 6  5  1  1001  A
 7  6  1   999  A
 8  7  3   999  B

Call the function using a "KNNRegressor" Object knr:


> res <- predict(model = knr,
                 data = df.pred,
                 key = "ID",
                 features = c("X1", "X2", "X3"),
                 stat.info = FALSE)

Output:


> res$Collect()
   ID   TARGET
1   0  7.00000
2   1  7.00000
3   2  7.00000
4   3 36.66667
5   4 36.66667
6   5 36.66667
7   6 39.66667
8   7 69.33333