hanaml.NaiveBayes {hana.ml.r}R Documentation

Naive Bayes

Description

hanaml.NaiveBayes is a R wrapper for PAL Naive Bayes.

Usage

hanaml.NaiveBayes(conn.context,
                 data = NULL,
                 key = NULL,
                 features = NULL,
                 formula = NULL,
                 label = NULL,
                 alpha =NULL,
                 discretization = NULL,
                 model.format = NULL,
                 categorical.variable = NULL,
                 thread.ratio = NULL)

Arguments

conn.context

ConnectionContext
The connection to the SAP HANA system.

data

DataFrame
DataFrame containing the data.

key

character, optional
Name of the ID column of data. If not specified, then data should have no ID column.

features

list of character, optional
Names of the feature columns. If features is not provided, it defaults to all non-ID, no-label columns.

formula

formula type, optional
Cannot be used along with features and label. If using formula, specify the formula type here.

label

character, optional
Name of the column in data that specifies the dependent variable. If not specified, it defaults the last no-ID column.

alpha

Double, optional
Laplace smoothing value. Set a positive value to enable Laplace smoothing for categorical variables and use that value as the smoothing parameter. Set value 0 to disable Laplace smoothing.
Defaults to 0.

discretization

c('no', 'supervised'), optional
Discretize continuous attributes. Case-insensitive. - 'no' or not provided: disable discretization. - 'supervised': use supervised discretization on all the continuous attributes. Defaults to no.

model.format

c('json', 'pmml'), optional
Controls whether to output the model in JSON format or PMML format. - 'json' or not provided: JSON format. - 'pmml': PMML format. Defaults to json.

categorical.variable

ListOfStrings, optional
INTEGER columns specified in this list will be treated as categorical data. Other INTEGER columns will be treated as continuous.

thread.ratio

double, optional
Controls the proportion of available threads to use. The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates up to all available threads. Values between 0 and 1 will use that percentage of available threads. Values outside this range tell PAL to heuristically determine the number of threads to use.
Defaults to 0.

Format

R6Class object.

Details

Naive Bayes is a classification algorithm based on Bayes theorem. It estimates the class-conditional probability by assuming that the attributes are conditionally independent of one another.

Value

Return a "NaiveBayes" object with following values:

Note

The Laplace value (alpha) is only stored by JSON format models. If the PMML format is chosen, you may need to set the Laplace value (alpha) again in predict() and score().

See Also

hanaml.NaiveBayes

Examples

## Not run: 
Input DataFrame df for training the model:

> df$collect()
ID HOMEOWNER MARITALSTATUS  ANNUALINCOME DEFAULTEDBORROWER
0        YES        Single         125.0               NO
1        NO       Married         100.0                NO
2        NO        Single          70.0                NO
3       YES       Married         120.0                NO
4        NO      Divorced          95.0               YES
5        NO       Married          60.0                NO
6       YES      Divorced         220.0                NO
7        NO        Single          85.0               YES
8        NO       Married          75.0                NO
9        NO        Single          90.0               YES

Training the model:

> nb <- hanaml.NaiveBayes(conn.context = conn, data = df, alpha = 1.0,
                         model.format = "pmml", thread.ratio = 0.2,
                         features = list('HOMEOWNER', 'MARITALSTATUS', 'ANNUALINCOME'),
                         label = "DEFAULTEDBORROWER")

Calculating Mean accuracy on the given test data and labels
can be done using score function.
> nb$score(nb, df1, "ID", alpha=1.0, verbose=True)

Output:
{0.875} Double value -  Mean accuracy on the given test data and labels.

## End(Not run)

[Package hana.ml.r version 1.0.8 Index]