hanaml.UnivariateAnalysis {hana.ml.r} | R Documentation |
hanaml.UnivariateAnalysis is a R wrapper for PAL Univariate Analysis.
hanaml.UnivariateAnalysis (conn.context, data, key = NULL, cols = NULL, categorical.variable = NULL, significance.level = NULL, trimmed.percentage = NULL)
conn.context |
|
data |
|
key |
|
cols |
|
categorical.variable |
|
significance.level |
|
trimmed.percentage |
|
Provides an overview of the dataset. For continuous columns, it provides the count of valid observations, min, lower quartile, median, upper quartile, max, mean, confidence interval for the mean (lower and upper bound), trimmed mean, variance, standard deviation, skewness, and kurtosis. For discrete columns, it provides the number of occurrences and the percentage of the total data in each category.
Return a result object containing two DataFrame:
continuous result: DataFrame
Statistics for continuous variables.
categorical result: DataFrame
Statistics for categorical variables.
## Not run: DataFrame df to be analyzed: > df$Collect() X1 X2 X3 X4 1 1.2 NA 1 A 2 2.5 NA 2 C 3 5.2 NA 3 A 4 -10.2 NA 2 A 5 8.5 NA 2 C 6 100.0 NA 3 B Perform univariate analysis: > output <- hanaml.UnivariateAnalysis(conn, df, categorical.variable='X3', significance.level=0.05, trimmed.percentage=0.2) > output[[1]] VARIABLE_NAME CATEGORY STAT_NAME STAT_VALUE 1 X3 __PAL_NULL__ count 0.00000 2 X3 __PAL_NULL__ percentage(%) 0.00000 3 X3 1 count 1.00000 4 X3 1 percentage(%) 16.66667 5 X3 2 count 3.00000 6 X3 2 percentage(%) 50.00000 7 X3 3 count 2.00000 8 X3 3 percentage(%) 33.33333 9 X4 __PAL_NULL__ count 0.00000 10 X4 __PAL_NULL__ percentage(%) 0.00000 11 X4 A count 3.00000 12 X4 A percentage(%) 50.00000 13 X4 B count 1.00000 14 X4 B percentage(%) 16.66667 15 X4 C count 2.00000 16 X4 C percentage(%) 33.33333 > output[[2]] VARIABLE_NAME STAT_NAME STAT_VALUE 1 X1 valid observations 6.000000 2 X1 min -10.200000 3 X1 lower quartile 1.200000 4 X1 median 3.850000 5 X1 upper quartile 8.500000 6 X1 max 100.000000 7 X1 mean 17.866667 8 X1 CI for mean, lower bound -24.879549 9 X1 CI for mean, upper bound 60.612883 10 X1 trimmed mean 4.350000 11 X1 variance 1659.142667 12 X1 standard deviation 40.732575 13 X1 skewness 1.688495 14 X1 kurtosis 1.036148 15 X2 valid observations 0.000000 ## End(Not run)