Similar to other transform methods, this function transforms values from a "CATPCA" object.

# S3 method for CATPCA
transform(
  model,
  data,
  key,
  features = NULL,
  n.components = NULL,
  thread.ratio = NULL,
  ignore.unknown.category = NULL,
  ...
)

Arguments

model

CATPCA R6 class
The model you want to transform the input data

data

DataFrame
DataFrame containting the data.

key

character
Name of the ID column.

features

character of list of characters, optional
Name of feature columns for prediction.
If not provided, it defaults to all non-key columns of data.

n.components

integer, optional
Number of components to be retained.
The value range is from 1 to number of given component loadings.
Defaults to the number of given components loadings.

thread.ratio

double, optional
Specifies the ratio of total thread number available. The value range is [0, 1].
0 means 1 thread, while 1 means all available threads.
Defaults to 1.0.

ignore.unknown.category

logical, optional
Specifies whether or not to ignore unknown category in data during data transformation.
If set to FALSE, an error message shall be raised when any unknown category is encountered; otherwise the unknown category is ignored with quantify 0.
Defaults to FALSE.

...

Reserved parameter.

Value

DataFrame
Transformed components score values for all points in the input data, structured as follows:

  • ID column, with same name and type as the ID column in data.

  • COMPONENT_ID, type INTEGER, representing categorical PCA component IDs.

  • COMPONENT_SCORE, type DOUBLE, holding the component score values for all points in data.

Examples

In the following context we perform the transformation on a DataFrame using "CATPCA" object cpc.
Input data for transformation


> data2$Collect()
  ID X1 X2 X3 X4 X5 X6
1  1 12  A 20 44 48 16
2  2 12  B 25 45 50 16
3  3 12  C 21 45 50 16
4  4 13  A 21 46 51 17
5  5 14  C 24 46 51 17
6  6 22  A 25 54 58 26

Call the function:


> result <- transform(cpc, data2,
                      key="ID", n.components=2,
                      thread.ratio = 0.5,
                      ignore.unknown.category=FALSE)

Output:


> result$Collect()
   ID COMPONENT_ID COMPONENT_SCORE
1   1            1      2.73451825
2   2            1      1.05566374
3   3            1      1.91871148
4   4            1      1.76884062
5   5            1      1.01998751
6   6            1     -2.38612629
7   1            2     -0.74763131
8   2            2      1.70968717
9   3            2     -0.06488889
10  4            2     -0.76744182
11  5            2      0.55794762
12  6            2     -1.09488511

See also