hanaml.Entropy.Rdhanaml.Entropy is a R wrapper for SAP HANA PAL Entropy.
hanaml.Entropy( data = NULL, col = NULL, distinct.value.count.detail = NULL, thread.ratio = NULL )
| data |
|
|---|---|
| col |
|
| distinct.value.count.detail |
|
| thread.ratio |
|
Returns a list of 2 DataFrame.
DataFrame 1
COLUMN_NAME Name of columns
ENTROPY ENTROPY
COUNT_OF_DISTINCT_VALUES Count of distinct value
DataFrame 2
COLUMN_NAME Name of columns
DISTINCT_VALUE Distinct values of columns
COUNT Count of each distinct value
This function is used to calculate the information entropy of attributes.
> data$Head(5)$Collect() OUTLOOK TEMP HUMIDITY WINDY CLASS 1 Sunny 75 70 Yes Play 2 Sunny NA 90 Yes Do not Play 3 Sunny 85 NA No Do not Play 4 Sunny 72 95 No Do not Play 5 <NA> NA 70 <NA> Play
Call the function:
> result <- hanaml.Entropy(data=data, col = c("TEMP", "WINDY"))
Results:
> result[[1]]$Collect() COLUMN_NAME ENTROPY COUNT_OF_DISTINCT_VALUES 1 TEMP 2.2538576 10 2 WINDY 0.6901857 2 > result[[2]]$Collect() COLUMN_NAME DISTINCT_VALUE COUNT 1 TEMP 75 2 2 TEMP 85 1 3 TEMP 72 2 4 TEMP 83 1 5 TEMP 64 1 6 TEMP 81 1 7 TEMP 71 1 8 TEMP 65 1 9 TEMP 68 1 10 TEMP 70 1 11 WINDY Yes 6 12 WINDY No 7