hanaml.Partition {hana.ml.r}R Documentation

Partition

Description

hanaml.Partition is a R wrapper for PAL Partition algorithm.

Usage

hanaml.Partition(conn.context,
                 data,
                 key,
                 features = NULL,
                 random.state = NULL,
                 thread.ratio = NULL, method = NULL,
                 stratified.column = NULL,
                 split.ratio = NULL,
                 split.size = NULL)

Arguments

conn.context

ConnectionContext
Database connection object

data

DataFrame
Dataset used for training the linear model.

key

character
Name of the ID column.

features

list of characters, optional
Names of the feature columns. If not provided, it defaults to all the non-ID and non-label columns.

random.state

integer, optional

Indicates the seed used to initialize the random number generator.

  • 0: Uses the system time

  • Not 0: Uses the specified seed

thread.ratio

double, optional Specifies the ratio of total number of threads that can be used by this function. The value range is from 0 to 1, where 0 means only using 1 thread, and 1 means using at most all the currently available threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.

method

character, optional

Partition method used for splitting dataset into train, test and validation sets:

  • "random": random partitions

  • "stratified": stratified partition


Defaults to "random".

stratified.column

character, optional
Indicates which column is used for stratification in the partition process.
Required and valid only when parition_method is set to 'stratified' (stratified partition).
No default value.

split.ratio

list of numeric, optional
List of 3 numerical numbers that specifies the percent of data used for training, testing and validation respectively. If not provided, defaults to c(0.8, 0.1, 0.1), i.e. 80 percent data used for training, 10 percent data used for testing and 10 percent data used for validation.

split.size

list of integers, optional
List of 3 integers that specifies the number of rows in data used for training, testing and validation respectively.

Format

R6Class object.

Value

Examples

## Not run: 
   Input DataFrame for Preprocessing:
> data$collect()
     ID HomeOwner MaritalStatus AnnualIncome DefaultedBorrower
1   0       YES        Single          125                NO
2   1        NO       Married          100                NO
3   2        NO        Single           70                NO
4   3       YES       Married          120                NO
5   4        NO      Divorced           95               YES
...
28 27        NO        Single           85               YES
29 28        NO       Married           75               YES
30 29        NO        Single           90               YES

 Create partition instance:
 > partition <- hanaml.Partition(conn, data, random.state = 23,
                                 method = "random",
                                 split.ratio = c(0.6, 0.2, 0.2))
Expected output:

 > partition[[1]]$Collect()
    ID HomeOwner MaritalStatus AnnualIncome DefaultedBorrower
 1   0       YES        Single          125                NO
 2   1        NO       Married          100                NO
 3   3       YES       Married          120                NO
 4   5        NO       Married           60                NO
 5   7        NO        Single           85               YES
 6  10       YES        Single          125                NO
 7  12        NO        Single           70                NO
 8  13       YES       Married          120                NO
 9  17        NO        Single           85               YES
 10 18        NO       Married           75                NO
 11 21        NO       Married          100                NO
 12 22        NO        Single           70                NO
 13 23       YES       Married          120                NO
 14 24        NO      Divorced           95               YES
 15 25        NO       Married           60                NO
 16 27        NO        Single           85               YES
 17 28        NO       Married           75               YES
 18 29        NO        Single           90               YES

## End(Not run)

[Package hana.ml.r version 1.0.8 Index]