R: Gaussian Mixture Model (GMM)

hanaml.GaussianMixture {hana.ml.r}

R Documentation

Gaussian Mixture Model (GMM)

Description

hanaml.GaussianMixture is a R wrapper for PAL Gaussian Mixture Model (GMM).

Usage

hanaml.GaussianMixture(conn.context,
                       data = NULL,
                       key = NULL,
                       features = NULL,
                       n.components = NULL,
                       init.param = NULL,
                       init.centers = NULL,
                       covariance.type = NULL,
                       shared.covariance = NULL,
                       thread.ratio = NULL,
                       max.iter = NULL,
                       category.weight = NULL,
                       categorical.variable = NULL,
                       error.tol = NULL,
                       regularization = NULL,
                       random.seed = NULL)

Arguments

`conn.context`	`ConnectionContext` Database connection to the SAP HANA system.
`data`	`DataFrame` DataFrame containing the data.
`key`	`DataFrame` Name of ID column.
`features`	`character or list of characters, optional` Names of the feature columns. If not provided, it defaults to all non-ID columns.
`n.components`	`integer, optional` Number of groups. Mandatory when init.param is not 'manual'.
`init.param`	`character` Specifies the initialization mode: `'farthest.first.traversal'`: The initial centers are given by the farthest-first traversal algorithm. `'manual'`: The initial centers are the init.centers given by user. `'random.means'`: The initial centers are the means of all the data that are randomly weighted. `'k.means++'`: The initial centers are given using the k-means++ approach.
`init.centers`	`integer, optional` Specifies the data (by using sequence number of the data in the data table (starting from 0)) to be used as init.centers. For example, if select sequence number 1, 5, 9 as centers, please input init.centers = c(1, 5, 9) Mandatory when init.param is 'manual'.
`covariance.type`	`character, optional` Specifies the type of covariance matrices in the model: `'full'`: use full covariance matrices. `'diag'`: use diagonal covariance matrices. `'tied.diag'`: use diagonal covariance matrices with all equal diagonal entries. Defaults to 'full'.
`shared.covariance`	`logical, optional` All clusters share the same covariance matrix if TRUE. Defaults to FALSE.
`thread.ratio`	`double, optional` Controls the proportion of available threads that can be used by this function. The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates all available threads. Values between 0 and 1 will use up to that percentage of available threads. Defaults to 0.
`max.iter`	`integer, optional` Specifies the maximum number of iterations for the EM algorithm. Defaults to 100.
`category.weight`	`double, optional` Represents the weight of category attributes. Defaults to 0.707.
`categorical.variable`	`character or list of characters, optional` Column names in the data table to use as category variable. No default value.
`error.tol`	`double, optional` Convergence threshold for exiting iterations. Defaults to 1.0e-6.
`regularization`	`float, optional` Regularization to be added to the diagonal of covariance matrices to ensure positive-definite. Defaults to 1e-6.
`random.seed`	`integer, optional` Indicates the seed used to initialize the random number generator: `0`: Uses the system time. `Not 0`: The initial centers are the init.centers given by user. Defaults to 0.

Format

R6Class object

Value

labels : DataFrame
Label assigned to each sample.
model : DataFrame
Model content.
stats : DataFrame
Statistic value.

Examples

## Not run: 
Input DataFrame data:
 ID  X1   X2   X3
 0  0.10  0.10  1
 1  0.11  0.10  1
 2  0.10  0.11  1
 3  0.11  0.11  1
 4  0.12  0.11  1

 Model traning and a "GaussianMixture" object gmm is returned:
> gmm <- hanaml.GaussianMixture(conn.context = conn,
                                data = data,
                                key = "ID",
                                n.components = 2,
                                init.param = 'k.means++',
                                covariance.type = 'full',
                                shared.covariance = TRUE,
                                thread.ratio = 0,
                                max.iter = 100,
                                category.weight = 0.707,
                                error.tol = 2.5,
                                regularization = 2.5,
                                random.seed = 5)

Expected output:
> gmm$labels$Collect()
      ID  CLUSTER_ID  PROBABILITY
       0     0            1
       1     0            1
       2     0            0
       3     0            0
       4     0            0
       0     1            0
       1     1            0
       2     1            1
       3     1            1
       4     1            1

## End(Not run)

[Package hana.ml.r version 1.0.8 Index]

Gaussian Mixture Model (GMM)

Description

Usage

Arguments

Format

Value

See Also

Examples