FPGrowth

class hana_ml.algorithms.pal.association.FPGrowth(min_support=None, min_confidence=None, relational=None, min_lift=None, max_conseq=None, max_len=None, ubiquitous=None, thread_ratio=None, timeout=None)

FP-Growth is an algorithm to find frequent patterns from transactions without generating a candidate itemset.

Parameters:

min_supportfloat, optional

User-specified minimum support, with valid range [0, 1].

Defaults to 0.

min_confidencefloat, optional

User-specified minimum confidence, with valid range [0, 1].

Defaults to 0.

relationalbool, optional

Whether or not to apply relational logic in FPGrowth algorithm.

If False, a single result table is produced; otherwise, the result table shall be split into three tables -- antecedent, consequent and statistics.

Defaults to False.

min_liftfloat, optional

User-specified minimum lift.

Defaults to 0.

max_conseqint, optional

Maximum length of consequent items.

Defaults to 10.

max_lenint, optional

Total length of antecedent items and consequent items in the output.

Defaults to 10.

ubiquitousfloat, optional

Item sets whose support values are greater than this number will be ignored during frequent items mining.

Defaults to 1.0.

thread_ratiofloat, optional

Specifies the ratio of total number of threads that can be used by this function.

The value range is from 0 to 1, where 0 means only using 1 thread, and 1 means using at most all the currently available threads.

Values outside the range will be ignored and this function heuristically determines the number of threads to use.

Defaults to 0.

timeoutint, optional

Specifies the maximum run time in seconds.

The algorithm will stop running when the specified timeout is reached.

Defaults to 3600.

Examples

Input data for associate rule mining:

>>> df.collect()
    TRANS  ITEM
     1     1
     1     2
     2     2
     2     3
     2     4
     3     1
     3     3
     3     4
     3     5
     4     1
    4     4
    4     5
    5     1
    5     2
    6     1
    6     2
    6     3
    6     4
    7     1
    8     1
    8     2
    8     3
    9     1
    9     2
    9     3
   10     2
   10     3
   10     5

Set up parameters:

>>> fpg = FPGrowth(min_support=0.2,
                   min_confidence=0.5,
                   relational=False,
                   min_lift=1.0,
                   max_conseq=1,
                   max_len=5,
                   ubiquitous=1.0,
                   thread_ratio=0,
                   timeout=3600)

Association rule mining using FPGrowth algorithm for the input data, and check the results:

>>> fpg.fit(data=df, lhs_restrict=[1,2,3])
>>> fpg.result_.collect()
  ANTECEDENT  CONSEQUENT  SUPPORT  CONFIDENCE      LIFT
0          2           3      0.5    0.714286  1.190476
1          3           2      0.5    0.833333  1.190476
2          3           4      0.3    0.500000  1.250000
3        1&2           3      0.3    0.600000  1.000000
4        1&3           2      0.3    0.750000  1.071429
5        1&3           4      0.2    0.500000  1.250000

Apriori algorithm set up using relational logic:

>>> fpgr = FPGrowth(min_support=0.2,
                    min_confidence=0.5,
                    relational=True,
                    min_lift=1.0,
                    max_conseq=1,
                    max_len=5,
                    ubiquitous=1.0,
                    thread_ratio=0,
                    timeout=3600)

Again mining association rules using FPGrowth algorithm for the input data, and check the resulting tables:

>>> fpgr.fit(data=df, rhs_restrict=[1, 2, 3])
>>> fpgr.antec_.collect()
   RULE_ID  ANTECEDENTITEM
      0               2
      1               3
      2               3
      3               1
      3               2
      4               1
      4               3
      5               1
      5               3

>>> fpgr.conseq_.collect()
   RULE_ID  CONSEQUENTITEM
      0               3
      1               2
      2               4
      3               3
      4               2
      5               4

>>> fpgr.stats_.collect()
   RULE_ID  SUPPORT  CONFIDENCE      LIFT
      0      0.5    0.714286  1.190476
      1      0.5    0.833333  1.190476
      2      0.3    0.500000  1.250000
      3      0.3    0.600000  1.000000
      4      0.3    0.750000  1.071429
      5      0.2    0.500000  1.250000

Attributes:

result_DataFrame

Mined association rules and related statistics, structured as follows:

1st column : antecedent(leading) items,

2nd column : consequent(dependent) items,

3rd column : support value,

4th column : confidence value,

5th column : lift value.

Available only when relational is False.

antec_DataFrame

Antecedent items of mined association rules, structured as follows:

1st column : association rule ID,

2nd column : antecedent items of the corresponding association rule.

Available only when relational is True.

conseq_DataFrame

Consequent items of mined association rules, structured as follows:

1st column : association rule ID,

2nd column : consequent items of the corresponding association rule.

Available only when relational is True.

stats_DataFrame

Statistics of the mined association rules, structured as follows:

1st column : rule ID,

2nd column : support value of the rule,

3rd column : confidence value of the rule,

4th column : lift value of the rule.

Available only when relational is True.

Methods

fit(data[, transaction, item, lhs_restrict, ...])

Association rule mining from the input data.

fit(data, transaction=None, item=None, lhs_restrict=None, rhs_restrict=None, lhs_complement_rhs=None, rhs_complement_lhs=None)

Association rule mining from the input data.

Parameters:

dataDataFrame

Input data for association rule mining.

transactionstr, optional

Name of the transaction column.

Defaults to the first column if not provided.

itemstr, optional

Name of the item column.

Defaults to the last non-transaction column if not provided.

lhs_restrictlist of int/str, optional

Specify items that are only allowed on the left-hand-side of association rules.

Elements in the list should be the same type as the item column.

rhs_restrictlist of int/str, optional

Specify items that are only allowed on the right-hand-side of association rules.

Elements in the list should be the same type as the item column.

lhs_complement_rhsbool, optional

If you use rhs_restrict to restrict some items to the left-hand-side of the association rules, you can set this parameter to True to restrict the complement items to the left-hand-side.

For example, if you have 100 items (i₁,i₂,...,i₁₀₀), and want to restrict i₁ and i₂ to the right-hand-side, and i₃, i₄,..., i₁₀₀ to the left-hand-side, you can set the parameters similarly as follows:

...

rhs_restrict = [i₁, i₂],

lhs_complement_rhs = True,

...

Defaults to False.

rhs_complement_lhsbool, optional

If you use lhs_restrict to restrict some items to the left-hand-side of association rules, you can set this parameter to True to restrict the complement items to the right-hand side.

Defaults to False.

property fit_hdbprocedure: Returns the generated hdbprocedure for fit.

property predict_hdbprocedure: Returns the generated hdbprocedure for predict.

Inherited Methods from PALBase

Besides those methods mentioned above, the FPGrowth class also inherits methods from PALBase class, please refer to PAL Base for more details.