Synthetic Minority Over-sampling Technique

hanaml.SMOTE is a R wrapper for SAP HANA PAL SMOTE.

hanaml.SMOTE(
  data,
  key = NULL,
  features = NULL,
  label = NULL,
  thread.ratio = NULL,
  random.state = NULL,
  n.neighbors = NULL,
  minority.class = NULL,
  smote.amount = NULL,
  algorithm = NULL
)

Arguments

data	`DataFrame` DataFrame containting the data.
key	`character, optional` Name of the ID column. If not provided, the data is assumed to have no ID column. No default value.
features	`character of list of characters, optional` Name of feature columns. If not provided, it defaults all non-key, non-label columns of data.
label	`character` Specifies the dependent variable by name.
thread.ratio	`double, optional` Controls the proportion of available threads that can be used by this function. The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates all available threads. Values between 0 and 1 will use up to that percentage of available threads.Values outside this range are ignored. Defaults to 0.
random.state	`double, optional` Specifies the seed for random number generation, where 0 means current system time is used as seed, and other values are simply real seed values. Defaults to 0.
n.neighbors	`integer, optional` Specifies the number of nearest neighbours. Defaults to 1.
minority.class	`character, optional` Specifies the targeted minority class value in the dependent variable column. When minority.class is not specified, all classes except majority class will be re-sampled to match the majority class sample amount.
smote.amount	`integer, optional` only valid when minority.class is presented by user. Specifies the number of nearest neighbours. When not speciedied, the algorithm will generated samples until the minority class sample amount matches the majority class sample amount
algorithm	`c("brute-force", "kd-tree"), optional` Specifies the searching algorithms for finding the nearest neighbors. `"brute-force"` use brute-force method `"kd-tree"` use kd-tree searching Defaults to "brute-force".

Value

DataFrame
Return dataset after sampling. The Output Table has the same structure as defined in the Input Table.

Details

SMOTE is a sampling method that oversamples the minority class to prepare the dataset for further applications. It creates new instances by taking each minority class sample and building convex combinations with the k nearest neighboring samples of the minority class.

Examples

> data.df$Collect()
   X1   X2   X3 TYPE
1   2    1 3.50    1
2   3   10 7.60    1
3   3   10 5.50    2
4   3   10 4.70    1
5   7 1000 8.50    1
6   8 1000 9.40    2
7   6 1000 0.34    1
8   8  999 7.40    2
9   7  999 3.50    1
10  6 1000 7.00    1

Call the function:

> result <- hanaml.SMOTE(data=data.df, thread.ratio = 1, random.state = 1,
                         label = "TYPE", minority.class = "2",
                         smote.amount = 200, n.neighbors = 2,
                         algorithm = "kd-tree")

Results:

> result$Collect()
   X1        X2       X3 TYPE
1   2    1.0000 3.500000    1
2   3   10.0000 7.600000    1
3   3   10.0000 5.500000    2
4   3   10.0000 4.700000    1
5   7 1000.0000 8.500000    1
6   8 1000.0000 9.400000    2
7   6 1000.0000 0.340000    1
8   8  999.0000 7.400000    2
9   7  999.0000 3.500000    1
10  6 1000.0000 7.000000    1
11  8  973.1091 7.350260    2
12  7  888.0711 8.959068    2
13  8  999.0567 7.513491    2
14  8  999.5123 8.424672    2
15  4  131.5100 5.733437    2
16  5  340.7139 6.135345    2

Arguments

Value

Details

Examples

See also