FP-Growth — hanaml.FPGrowth • hana.ml.r

hanaml.FPGrowth is a R wrapper for SAP HANA PAL FPGROWTH and FPGROWTH_RELATIONAL.

hanaml.FPGrowth(
  data,
  used.cols = NULL,
  min.support = NULL,
  min.confidence = NULL,
  min.lift = NULL,
  relational = FALSE,
  max.item.length = NULL,
  max.consequent = NULL,
  ubiquitous = NULL,
  lhs.restrict = NULL,
  rhs.complement.lhs = NULL,
  rhs.restrict = NULL,
  lhs.complement.rhs = NULL,
  timeout = NULL,
  thread.ratio = NULL
)

Arguments

data

DataFrame
DataFrame containting the data.

used.cols

list of characters, optional
Specified the columns in data that specify transaction IDs and item IDs. For example, consider that the transaction ID column for data is "CUSTOMER", while the item ID column for data is "ITEM", then the correct way to set up this parameter is

used.cols = list(transaction = "CUSTOMER", item = "ITEM")

Transaction ID column defaults to the 1st column of data, while item ID column defaults to the 2nd column of data.

min.support

numeric, optional
User-specified minimum support value for rule generation.
Defaults to 0.

min.confidence

numeric, optional
User-specified minimum confidence value for rule generation.
Defaults to 0.

min.lift

numeric, optional
User-specified minimum lift value for rule generation.
Defaults to 0.

relational

logical, optional
Whether or not to apply relational logic for association rule mining. This will affect the format view of mined association rules.
Defaults to FALSE.

max.item.length

integer, optional
User-specified maximum length of items, inclusive of both antecedent and consequent items for association rule generation.
Defaults to 10.

max.consequent

double, optional
Maximum length of consequent items for association rule generation.
Defaults to 100.

ubiquitous

double, optional
User-specified maximum support value during the frequent items mining phase, i.e. if an item has support value above ubiquitous, it shall be ignored.
Defaults to 1.0.

lhs.restrict

list of characters, optional
Specifies the items are only allowed to be antecedent items, i.e. they can only appear on the left-hand side of association rules.
No default value.

rhs.complement.lhs

logical, optional
If lhs.restrict is not NULL, you can set this parameter to TRUE to restrict rest of items so that they can only appear on the right-hand-side of association rules.
Defaults to FALSE.

rhs.restrict

list of characters, optional
Specifies the items are only allowed to be consequent items, i.e. they can only appear on the right-hand-side of association rules.
No default value.

lhs.complement.rhs

logical, optional
If rhs.restrict is not NULL, you can set this parameter to TRUE to restrict rest of items so that they can only appear on the left-hand-side of association rules.
Defaults to FALSE.

timeout

integer, optional
Specifies the maximum run time in seconds for association rule mining. The algorithm will stop running when the specified timeout is reached.
Defaults to 3600.

thread.ratio

double, optional
Controls the proportion of available threads that can be used by this function.
The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates all available threads.
Values between 0 and 1 will use up to that percentage of available threads.Values outside this range are ignored.
Defaults to 0.

Value

A "FPGrowth" object with the following attributes:

result: DataFrame
Mined association rules as a whole. Each rule has its antecedent/consequent items and support/confidence/lift values. Available only when relatiional is FALSE.
antecedent: DataFrame
Antecedent item information of mined association rules. Available only when relational is TRUE.
consequent: DataFrame
Consequent item information of mined association rules. Available only when relational is TRUE.
statistics: DataFrame
Support/confidence/lift values of mined association rules. Available only when relational is TRUE.
model: DataFrame
Mined association rules in PMML format.
Available only when pmml.export is 'single-row' or 'multi-row'.

Examples

Input DataFrame data:


> data$Collect()
   TRANS ITEM
1      1    1
2      1    2
3      2    2
4      2    3
5      2    4
6      3    1
7      3    3
8      3    4
9      3    5
10     4    1
11     4    4
12     4    5
13     5    1
14     5    2
15     6    1
16     6    2
17     6    3
18     6    4
19     7    1
20     8    1
21     8    2
22     8    3
23     9    1
24     9    2
25     9    3
26    10    2
27    10    3
28    10    5

Call the function:


> fpg <- hanaml.FPGrowth(data = data,
                        used.cols = c(transaction = "TRANS",
                                      item = "ITEM"),
                        min.support = 0.2, min.confidence = 0.5,
                        max.item.length = 5, min.lift = 1,
                        max.consequent = 1, lhs.restrict = c(1,2,3),
                        timeout = 60)

Output:


> fpg$result$Collect()
  ANTECEDENT CONSEQUENT SUPPORT CONFIDENCE     LIFT
1          2          3     0.5  0.7142857 1.190476
2          3          2     0.5  0.8333333 1.190476
3          3          4     0.3  0.5000000 1.250000
4        1&3          4     0.2  0.5000000 1.250000
5        1&2          3     0.3  0.6000000 1.000000
6        1&3          2     0.3  0.7500000 1.071429