hanaml.Entropy is a R wrapper for SAP HANA PAL Entropy.

hanaml.Entropy(
  data = NULL,
  col = NULL,
  distinct.value.count.detail = NULL,
  thread.ratio = NULL
)

Arguments

data

DataFrame
DataFrame containting the data. Attributes with continuous data type are ignored.

col

character of list of characters, optional
Name of columns to be processed.
If not provided, it defaults all columns of data.

distinct.value.count.detail

logical, optional
Indicates whether to output the details of distinct value counts. By having complementary is FALSE, it does not output detailed distinct value count.
Defaults to TRUE.

thread.ratio

double, optional
Controls the proportion of available threads that can be used by this function.
The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates all available threads.
Values between 0 and 1 will use up to that percentage of available threads.Values outside this range are ignored.
Defaults to 0.

Value

Returns a list of 2 DataFrames.

  • DataFrame 1

    • COLUMN_NAME: Name of columns

    • ENTROPY: ENTROPY

    • COUNT_OF_DISTINCT_VALUES: Count of distinct value

  • DataFrame 2

    • COLUMN_NAME: Name of columns

    • DISTINCT_VALUE: Distinct values of columns

    • COUNT: Count of each distinct value

Details

This function is used to calculate the information entropy of attributes.

Examples


> data$Head(5)$Collect()
  OUTLOOK TEMP HUMIDITY WINDY       CLASS
1   Sunny   75       70   Yes        Play
2   Sunny   NA       90   Yes Do not Play
3   Sunny   85       NA    No Do not Play
4   Sunny   72       95    No Do not Play
5    <NA>   NA       70  <NA>        Play

Call the function:


> result <- hanaml.Entropy(data=data, col = c("TEMP", "WINDY"))

Results:


> result[[1]]$Collect()
  COLUMN_NAME   ENTROPY COUNT_OF_DISTINCT_VALUES
1        TEMP 2.2538576                       10
2       WINDY 0.6901857                        2
> result[[2]]$Collect()
   COLUMN_NAME DISTINCT_VALUE COUNT
1         TEMP             75     2
2         TEMP             85     1
3         TEMP             72     2
4         TEMP             83     1
5         TEMP             64     1
6         TEMP             81     1
7         TEMP             71     1
8         TEMP             65     1
9         TEMP             68     1
10        TEMP             70     1
11       WINDY            Yes     6
12       WINDY             No     7