hanaml.Entropy.Rd
hanaml.Entropy is a R wrapper for SAP HANA PAL Entropy.
hanaml.Entropy(
data = NULL,
col = NULL,
distinct.value.count.detail = NULL,
thread.ratio = NULL
)
DataFrame
DataFrame containting the data.
Attributes with continuous data type are ignored.
character of list of characters, optional
Name of columns to be processed.
If not provided, it defaults all columns of data.
logical, optional
Indicates whether to output the details of distinct
value counts. By having complementary is FALSE,
it does not output detailed distinct value count.
Defaults to TRUE.
double, optional
Controls the proportion of available threads that can be used by this
function.
The value range is from 0 to 1, where 0 indicates a single thread,
and 1 indicates all available threads.
Values between 0 and 1 will use up to
that percentage of available threads.Values outside this
range are ignored.
Defaults to 0.
Returns a list of 2 DataFrames.
DataFrame 1
COLUMN_NAME: Name of columns
ENTROPY: ENTROPY
COUNT_OF_DISTINCT_VALUES: Count of distinct value
DataFrame 2
COLUMN_NAME: Name of columns
DISTINCT_VALUE: Distinct values of columns
COUNT: Count of each distinct value
This function is used to calculate the information entropy of attributes.
> data$Head(5)$Collect()
OUTLOOK TEMP HUMIDITY WINDY CLASS
1 Sunny 75 70 Yes Play
2 Sunny NA 90 Yes Do not Play
3 Sunny 85 NA No Do not Play
4 Sunny 72 95 No Do not Play
5 <NA> NA 70 <NA> Play
Call the function:
> result <- hanaml.Entropy(data=data, col = c("TEMP", "WINDY"))
Results:
> result[[1]]$Collect()
COLUMN_NAME ENTROPY COUNT_OF_DISTINCT_VALUES
1 TEMP 2.2538576 10
2 WINDY 0.6901857 2
> result[[2]]$Collect()
COLUMN_NAME DISTINCT_VALUE COUNT
1 TEMP 75 2
2 TEMP 85 1
3 TEMP 72 2
4 TEMP 83 1
5 TEMP 64 1
6 TEMP 81 1
7 TEMP 71 1
8 TEMP 65 1
9 TEMP 68 1
10 TEMP 70 1
11 WINDY Yes 6
12 WINDY No 7