hanaml.FPGrowth {hana.ml.r}R Documentation

FP-Growth algorithm for association rule mining

Description

FP-Growth algorithm for association rule minining, based on PAL_FPGROWTH and PAL_FPGROWTH_RELATIONAL.

Usage

hanaml.FPGrowth(conn.context, data, used.cols = NULL,
               min.support = NULL, min.confidence = NULL,
               min.lift = NULL,
               relational = FALSE, max.item.length = NULL,
               max.consequent = NULL, ubiquitous = NULL,
               lhs.restrict = NULL, rhs.complement.lhs = NULL,
               rhs.restrict = NULL, lhs.complement.rhs = NULL,
               timeout = NULL, thread.ratio = NULL)

Arguments

conn.context

ConnectionContext
Database connection object.

data

DataFrame Dataset used for association rule mininig.

min.support

numeric
User-specified minimum support value for rule generation.

min.confidence

numeric, optional
User-specified minimum confidence value for rule generation.

used.cols

list of characters, optional
Specified the columns in data that specify transaction IDs and item IDs. For example, consider that the transaction ID column for data is "CUSTOMER", while the item ID colum for data is "ITEM", then the correct way to set up this parameter is

used.cols = list("transaction" = "CUSTOMER", "item" = "ITEM"). Transaction ID column defaults to the 1st column of data, while item ID column defauts to the 2nd column of data.

relational

logical, optional
Whether or not to apply relational logic for association rule mining. This will affect the format view of mined association rules. Defaults to FALSE.

min.lift

numeric, optional
User-specified minimum lift value for rule generation. Defaults to 0.

max.item.length

integer, optional
User-specified maximum length of items, inclusive of both antecedent and consequent items for association rule generation.
Defaults to 5.

max.consequent

double, optional
Maximum length of consequent items for association rule generation.
Defaults to 100.

ubiquitous

double, optional
User-specified maximum support value during the frequent items mining phase, i.e. if an item has support value above ubiquitous, it shall be ignored. Defaults to 1.0.

lhs.restrict

list of characters, optional
Specifies the items are only allowed to be antecedent items, i.e. they can only appear on the left-hand side of association rules.

rhs.complement.lhs

logical, optional
If 'lhs.restrict' is not NULL, you can set this paramter to TRUE to restrict rest of items so that they can only appear on the right-hand-side of association rules.

rhs.restrict

list of characters, optional
Specifies the items are only allowed to be consequent items, i.e. they can only appear on the right-hand-side of association rules.

lhs.complement.rhs

logical, optional
If 'rhs.restrict' is not NULL, you can set this paramter to TRUE to restrict rest of items so that they can only appear on the left-hand-side of association rules.

timeout

integer, optional
Specifies the maximum run time in seconds for association rule mining. The algorithm will stop running when the specified timeout is reached.

thread.ratio

double, optional
Controls the proportion of available threads to use. The value range is from 0 to 1, where 0 means only using 1 thread, and 1 means using at most all the currently available threads. Values outside this range tell PAL to heuristically determine the number of threads to use.

Format

R6Class object.

Value

An "FPGrowth" object with the following attributes:

Examples

## Not run: 
> df
   TRANS ITEM
1      1    1
2      1    2
3      2    2
4      2    3
5      2    4
6      3    1
7      3    3
8      3    4
9      3    5
10     4    1
11     4    4
12     4    5
13     5    1
14     5    2
15     6    1
16     6    2
17     6    3
18     6    4
19     7    1
20     8    1
21     8    2
22     8    3
23     9    1
24     9    2
25     9    3
26    10    2
27    10    3
28    10    5

> fpg <- hanaml.FPGrowth(conn.context = conn, data = df,
                        used.cols = c("transaction" = "TRANS", "item" = "ITEM"),
                        min.support = 0.2, min.confidence = 0.5,
                        max.item.length = 5, min.lift = 1,
                        max.consequent = 1, lhs.restrict = c(1,2,3),
                        timeout = 60)

> fpg$result
ANTECEDENT CONSEQUENT SUPPORT CONFIDENCE     LIFT
1          2          3     0.5  0.7142857 1.190476
2          3          2     0.5  0.8333333 1.190476
3          3          4     0.3  0.5000000 1.250000
4        1&3          4     0.2  0.5000000 1.250000
5        1&2          3     0.3  0.6000000 1.000000
6        1&3          2     0.3  0.7500000 1.071429

## End(Not run)

[Package hana.ml.r version 1.0.8 Index]