R: Univariate Analysis

hanaml.UnivariateAnalysis {hana.ml.r}

R Documentation

Univariate Analysis

Description

hanaml.UnivariateAnalysis is a R wrapper for PAL Univariate Analysis.

Usage

hanaml.UnivariateAnalysis (conn.context, data,
                          key = NULL, cols = NULL,
                          categorical.variable = NULL,
                          significance.level = NULL,
                          trimmed.percentage = NULL)

Arguments

`conn.context`	`ConnectionContext` The connection to the SAP HANA system.
`data`	`DataFrame` DataFrame containing the data.
`key`	`character, optional` Name of the ID column of data.
`cols`	`list of characters, optional` List of column names to analyze. If not provided, it defaults to all non-ID columns.
`categorical.variable`	`list of characters, optional` INTEGER columns specified in this list will be treated as categorical data. By default, INTEGER columns are treated as continuous.
`significance.level`	`double, optional` The significance level when the function calculates the confidence interval of the sample mean. Values must be greater than 0 and less than 1. Defaults to 0.05.
`trimmed.percentage`	`double, optional` The ratio of data at both head and tail that will be dropped in the process of calculating the trimmed mean. Value range is from 0 to 0.5. Defaults to 0.05.

Details

Provides an overview of the dataset. For continuous columns, it provides the count of valid observations, min, lower quartile, median, upper quartile, max, mean, confidence interval for the mean (lower and upper bound), trimmed mean, variance, standard deviation, skewness, and kurtosis. For discrete columns, it provides the number of occurrences and the percentage of the total data in each category.

Value

Return a result object containing two DataFrame:

continuous result: DataFrame
Statistics for continuous variables.
categorical result: DataFrame
Statistics for categorical variables.

Examples

## Not run: 

DataFrame df to be analyzed:

> df$Collect()
     X1 X2 X3 X4
1   1.2 NA  1  A
2   2.5 NA  2  C
3   5.2 NA  3  A
4 -10.2 NA  2  A
5   8.5 NA  2  C
6 100.0 NA  3  B

Perform univariate analysis:
> output <- hanaml.UnivariateAnalysis(conn, df, categorical.variable='X3',
                                      significance.level=0.05,
                                      trimmed.percentage=0.2)
> output[[1]]
   VARIABLE_NAME     CATEGORY     STAT_NAME STAT_VALUE
1             X3 __PAL_NULL__         count    0.00000
2             X3 __PAL_NULL__ percentage(%)    0.00000
3             X3            1         count    1.00000
4             X3            1 percentage(%)   16.66667
5             X3            2         count    3.00000
6             X3            2 percentage(%)   50.00000
7             X3            3         count    2.00000
8             X3            3 percentage(%)   33.33333
9             X4 __PAL_NULL__         count    0.00000
10            X4 __PAL_NULL__ percentage(%)    0.00000
11            X4            A         count    3.00000
12            X4            A percentage(%)   50.00000
13            X4            B         count    1.00000
14            X4            B percentage(%)   16.66667
15            X4            C         count    2.00000
16            X4            C percentage(%)   33.33333

> output[[2]]
   VARIABLE_NAME                STAT_NAME  STAT_VALUE
1             X1       valid observations    6.000000
2             X1                      min  -10.200000
3             X1           lower quartile    1.200000
4             X1                   median    3.850000
5             X1           upper quartile    8.500000
6             X1                      max  100.000000
7             X1                     mean   17.866667
8             X1 CI for mean, lower bound  -24.879549
9             X1 CI for mean, upper bound   60.612883
10            X1             trimmed mean    4.350000
11            X1                 variance 1659.142667
12            X1       standard deviation   40.732575
13            X1                 skewness    1.688495
14            X1                 kurtosis    1.036148
15            X2       valid observations    0.000000

## End(Not run)

[Package hana.ml.r version 1.0.8 Index]