FPGrowth
- class hana_ml.algorithms.pal.association.FPGrowth(min_support=None, min_confidence=None, relational=None, min_lift=None, max_conseq=None, max_len=None, ubiquitous=None, thread_ratio=None, timeout=None)
The Frequent Pattern Growth (FP-Growth) algorithm is a technique used for finding frequent patterns in a transaction dataset without generating a candidate itemset. This is achieved by building a prefix tree (FP Tree) to compress information and subsequently retrieve frequent itemsets efficiently.
- Parameters:
- min_supportfloat, optional
Specifies the minimum support value, which falls within the valid range of [0, 1].
Defaults to 0.
- min_confidencefloat, optional
Specifies the minimum confidence value, with an acceptable range between [0, 1].
Defaults to 0.
- relationalbool, optional
Determines whether relational logic should be applied within the Apriori algorithm. If set to False, a single combined results table will be produced. Conversely, if set to True, the result will be split across three tables: antecedent, consequent, and statistics.
Defaults to False.
- min_liftfloat, optional
Specifies the minimum lift.
Defaults to 0.
- max_conseqint, optional
Specifies the maximum length of consequent items.
Defaults to 10.
- max_lenint, optional
Specifies the total length of both antecedent items and consequent items in the output.
Defaults to 10.
- ubiquitousfloat, optional
This parameter is used to ignore item sets with support values greater than this threshold during frequent itemset mining.
Defaults to 1.0.
- thread_ratiofloat, optional
Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.
Defaults to 0.
- timeoutint, optional
Specifies the maximum run time for the algorithm in seconds. The algorithm will cease computation if the specified timeout is exceeded.
Defaults to 3600.
Examples
Input DataFrame df:
>>> df.collect() TRANS ITEM 0 1 1 1 1 2 ...... 26 10 3 27 10 5
Initialize a FPGrowth object:
>>> fpg = FPGrowth(min_support=0.2, min_confidence=0.5, relational=False, min_lift=1.0, max_conseq=1, max_len=5, ubiquitous=1.0, thread_ratio=0, timeout=3600)
Perform fit():
>>> fpg.fit(data=df, lhs_restrict=[1,2,3]) >>> fpg.result_.collect() ANTECEDENT CONSEQUENT SUPPORT CONFIDENCE LIFT 0 2 3 0.5 0.714286 1.190476 1 3 2 0.5 0.833333 1.190476 2 3 4 0.3 0.500000 1.250000 3 1&2 3 0.3 0.600000 1.000000 4 1&3 2 0.3 0.750000 1.071429 5 1&3 4 0.2 0.500000 1.250000
Also, initialize a FPGrowth object and set its parameters with relational logic:
>>> fpgr = FPGrowth(min_support=0.2, min_confidence=0.5, relational=True, min_lift=1.0, max_conseq=1, max_len=5, ubiquitous=1.0, thread_ratio=0, timeout=3600)
Perform fit():
>>> fpgr.fit(data=df, rhs_restrict=[1, 2, 3]) >>> fpgr.antec_.collect() RULE_ID ANTECEDENTITEM 0 0 2 1 1 3 2 2 3 ... 6 4 3 7 5 1 8 5 3
>>> fpgr.conseq_.collect() RULE_ID CONSEQUENTITEM 0 0 3 1 1 2 2 2 4 3 3 3 4 4 2 5 5 4
>>> fpgr.stats_.collect() RULE_ID SUPPORT CONFIDENCE LIFT 0 0 0.5 0.714286 1.190476 1 1 0.5 0.833333 1.190476 2 2 0.3 0.500000 1.250000 3 3 0.3 0.600000 1.000000 4 4 0.3 0.750000 1.071429 5 5 0.2 0.500000 1.250000
- Attributes:
- result_DataFrame
Mined association rules and related statistics, structured as follows:
1st column : antecedent(leading) items,
2nd column : consequent(dependent) items,
3rd column : support value,
4th column : confidence value,
5th column : lift value.
Available only when
relational
is False.- antec_DataFrame
Antecedent items of mined association rules, structured as follows:
1st column : association rule ID,
2nd column : antecedent items of the corresponding association rule.
Available only when
relational
is True.- conseq_DataFrame
Consequent items of mined association rules, structured as follows:
1st column : association rule ID,
2nd column : consequent items of the corresponding association rule.
Available only when
relational
is True.- stats_DataFrame
Statistics of the mined association rules, structured as follows:
1st column : rule ID,
2nd column : support value of the rule,
3rd column : confidence value of the rule,
4th column : lift value of the rule.
Available only when
relational
is True.
Methods
fit
(data[, transaction, item, lhs_restrict, ...])Association rule mining on the given data.
Get the model metrics.
Get the score metrics.
- fit(data, transaction=None, item=None, lhs_restrict=None, rhs_restrict=None, lhs_complement_rhs=None, rhs_complement_lhs=None)
Association rule mining on the given data.
- Parameters:
- dataDataFrame
The input data.
- transactionstr, optional
Name of the transaction column.
Defaults to the first column if not provided.
- itemstr, optional
Name of the item column.
Defaults to the last non-transaction column if not provided.
- lhs_restrictlist of int/str, optional
Specifies items that are only allowed on the left-hand-side of association rules.
Elements in the list should be the same type as the item column.
- rhs_restrictlist of int/str, optional
Specifies items that are only allowed on the right-hand-side of association rules.
Elements in the list should be the same type as the item column.
- lhs_complement_rhsbool, optional
If you use
rhs_restrict
to restrict some items to the left-hand-side of the association rules, you can set this parameter to True to restrict the complement items to the left-hand-side.For example, if you have 100 items (i1,i2,...,i100), and want to restrict i1 and i2 to the right-hand-side, and i3, i4,..., i100 to the left-hand-side, you can set the parameters similarly as follows:
...
rhs_restrict = [i1, i2],
lhs_complement_rhs = True,
...
Defaults to False.
- rhs_complement_lhsbool, optional
If you use
lhs_restrict
to restrict some items to the left-hand-side of association rules, you can set this parameter to True to restrict the complement items to the right-hand side.Defaults to False.
- get_model_metrics()
Get the model metrics.
- Returns:
- DataFrame
The model metrics.
- get_score_metrics()
Get the score metrics.
- Returns:
- DataFrame
The score metrics.
Inherited Methods from PALBase
Besides those methods mentioned above, the FPGrowth class also inherits methods from PALBase class, please refer to PAL Base for more details.