| hanaml.UnivariateAnalysis {hana.ml.r} | R Documentation |
hanaml.UnivariateAnalysis is a R wrapper for PAL Univariate Analysis.
hanaml.UnivariateAnalysis (conn.context, data,
key = NULL, cols = NULL,
categorical.variable = NULL,
significance.level = NULL,
trimmed.percentage = NULL)
conn.context |
|
data |
|
key |
|
cols |
|
categorical.variable |
|
significance.level |
|
trimmed.percentage |
|
Provides an overview of the dataset. For continuous columns, it provides the count of valid observations, min, lower quartile, median, upper quartile, max, mean, confidence interval for the mean (lower and upper bound), trimmed mean, variance, standard deviation, skewness, and kurtosis. For discrete columns, it provides the number of occurrences and the percentage of the total data in each category.
Return a result object containing two DataFrame:
continuous result: DataFrame
Statistics for continuous variables.
categorical result: DataFrame
Statistics for categorical variables.
## Not run:
DataFrame df to be analyzed:
> df$Collect()
X1 X2 X3 X4
1 1.2 NA 1 A
2 2.5 NA 2 C
3 5.2 NA 3 A
4 -10.2 NA 2 A
5 8.5 NA 2 C
6 100.0 NA 3 B
Perform univariate analysis:
> output <- hanaml.UnivariateAnalysis(conn, df, categorical.variable='X3',
significance.level=0.05,
trimmed.percentage=0.2)
> output[[1]]
VARIABLE_NAME CATEGORY STAT_NAME STAT_VALUE
1 X3 __PAL_NULL__ count 0.00000
2 X3 __PAL_NULL__ percentage(%) 0.00000
3 X3 1 count 1.00000
4 X3 1 percentage(%) 16.66667
5 X3 2 count 3.00000
6 X3 2 percentage(%) 50.00000
7 X3 3 count 2.00000
8 X3 3 percentage(%) 33.33333
9 X4 __PAL_NULL__ count 0.00000
10 X4 __PAL_NULL__ percentage(%) 0.00000
11 X4 A count 3.00000
12 X4 A percentage(%) 50.00000
13 X4 B count 1.00000
14 X4 B percentage(%) 16.66667
15 X4 C count 2.00000
16 X4 C percentage(%) 33.33333
> output[[2]]
VARIABLE_NAME STAT_NAME STAT_VALUE
1 X1 valid observations 6.000000
2 X1 min -10.200000
3 X1 lower quartile 1.200000
4 X1 median 3.850000
5 X1 upper quartile 8.500000
6 X1 max 100.000000
7 X1 mean 17.866667
8 X1 CI for mean, lower bound -24.879549
9 X1 CI for mean, upper bound 60.612883
10 X1 trimmed mean 4.350000
11 X1 variance 1659.142667
12 X1 standard deviation 40.732575
13 X1 skewness 1.688495
14 X1 kurtosis 1.036148
15 X2 valid observations 0.000000
## End(Not run)