FPGrowth
- class hana_ml.algorithms.pal.association.FPGrowth(min_support=None, min_confidence=None, relational=None, min_lift=None, max_conseq=None, max_len=None, ubiquitous=None, thread_ratio=None, timeout=None)
FP-Growth is an algorithm to find frequent patterns from transactions without generating a candidate itemset.
- Parameters
- min_supportfloat, optional
User-specified minimum support, with valid range [0, 1].
Defaults to 0.
- min_confidencefloat, optional
User-specified minimum confidence, with valid range [0, 1].
Defaults to 0.
- relationalbool, optional
Whether or not to apply relational logic in FPGrowth algorithm.
If False, a single result table is produced; otherwise, the result table shall be split into three tables -- antecedent, consequent and statistics.
Defaults to False.
- min_liftfloat, optional
User-specified minimum lift.
Defaults to 0.
- max_conseqint, optional
Maximum length of consequent items.
Defaults to 10.
- max_lenint, optional
Total length of antecedent items and consequent items in the output.
Defaults to 10.
- ubiquitousfloat, optional
Item sets whose support values are greater than this number will be ignored during frequent items mining.
Defaults to 1.0.
- thread_ratiofloat, optional
Specifies the ratio of total number of threads that can be used by this function.
The value range is from 0 to 1, where 0 means only using 1 thread, and 1 means using at most all the currently available threads.
Values outside the range will be ignored and this function heuristically determines the number of threads to use.
Defaults to 0.
- timeoutint, optional
Specifies the maximum run time in seconds.
The algorithm will stop running when the specified timeout is reached.
Defaults to 3600.
Examples
Input data for associate rule mining:
>>> df.collect() TRANS ITEM 0 1 1 1 1 2 2 2 2 3 2 3 4 2 4 5 3 1 6 3 3 7 3 4 8 3 5 9 4 1 10 4 4 11 4 5 12 5 1 13 5 2 14 6 1 15 6 2 16 6 3 17 6 4 18 7 1 19 8 1 20 8 2 21 8 3 22 9 1 23 9 2 24 9 3 25 10 2 26 10 3 27 10 5
Set up parameters:
>>> fpg = FPGrowth(min_support=0.2, min_confidence=0.5, relational=False, min_lift=1.0, max_conseq=1, max_len=5, ubiquitous=1.0, thread_ratio=0, timeout=3600)
Association rule mining using FPGrowth algorithm for the input data, and check the results:
>>> fpg.fit(data=df, lhs_restrict=[1,2,3]) >>> fpg.result_.collect() ANTECEDENT CONSEQUENT SUPPORT CONFIDENCE LIFT 0 2 3 0.5 0.714286 1.190476 1 3 2 0.5 0.833333 1.190476 2 3 4 0.3 0.500000 1.250000 3 1&2 3 0.3 0.600000 1.000000 4 1&3 2 0.3 0.750000 1.071429 5 1&3 4 0.2 0.500000 1.250000
Apriori algorithm set up using relational logic:
>>> fpgr = FPGrowth(min_support=0.2, min_confidence=0.5, relational=True, min_lift=1.0, max_conseq=1, max_len=5, ubiquitous=1.0, thread_ratio=0, timeout=3600)
Again mining association rules using FPGrowth algorithm for the input data, and check the resulting tables:
>>> fpgr.fit(data=df, rhs_restrict=[1, 2, 3]) >>> fpgr.antec_.collect() RULE_ID ANTECEDENTITEM 0 0 2 1 1 3 2 2 3 3 3 1 4 3 2 5 4 1 6 4 3 7 5 1 8 5 3
>>> fpgr.conseq_.collect() RULE_ID CONSEQUENTITEM 0 0 3 1 1 2 2 2 4 3 3 3 4 4 2 5 5 4
>>> fpgr.stats_.collect() RULE_ID SUPPORT CONFIDENCE LIFT 0 0 0.5 0.714286 1.190476 1 1 0.5 0.833333 1.190476 2 2 0.3 0.500000 1.250000 3 3 0.3 0.600000 1.000000 4 4 0.3 0.750000 1.071429 5 5 0.2 0.500000 1.250000
- Attributes
- result_DataFrame
Mined association rules and related statistics, structured as follows:
1st column : antecedent(leading) items,
2nd column : consequent(dependent) items,
3rd column : support value,
4th column : confidence value,
5th column : lift value.
Available only when
relational
is False.- antec_DataFrame
Antecedent items of mined association rules, structured as follows:
lst column : association rule ID,
2nd column : antecedent items of the corresponding association rule.
Available only when
relational
is True.- conseq_DataFrame
Consequent items of mined association rules, structured as follows:
1st column : association rule ID,
2nd column : consequent items of the corresponding association rule.
Available only when
relational
is True.- stats_DataFrame
Statistics of the mined association rules, structured as follows:
1st column : rule ID,
2nd column : support value of the rule,
3rd column : confidence value of the rule,
4th column : lift value of the rule.
Available only when
relational
is True.
Methods
fit
(data[, transaction, item, lhs_restrict, ...])Association rule mining from the input data.
- fit(data, transaction=None, item=None, lhs_restrict=None, rhs_restrict=None, lhs_complement_rhs=None, rhs_complement_lhs=None)
Association rule mining from the input data.
- Parameters
- dataDataFrame
Input data for association rule mining.
- transactionstr, optional
Name of the transaction column.
Defaults to the first column if not provided.
- itemstr, optional
Name of the item column.
Defaults to the last non-transaction column if not provided.
- lhs_restrictlist of int/str, optional
Specify items that are only allowed on the left-hand-side of association rules.
Elements in the list should be the same type as the item column.
- rhs_restrictlist of int/str, optional
Specify items that are only allowed on the right-hand-side of association rules.
Elements in the list should be the same type as the item column.
- lhs_complement_rhsbool, optional
If you use
rhs_restrict
to restrict some items to the left-hand-side of the association rules, you can set this parameter to True to restrict the complement items to the left-hand-side.For example, if you have 100 items (i1,i2,...,i100), and want to restrict i1 and i2 to the right-hand-side, and i3, i4,..., i100 to the left-hand-side, you can set the parameters similarly as follows:
...
rhs_restrict = [i1, i2],
lhs_complement_rhs = True,
...
Defaults to False.
- rhs_complement_lhsbool, optional
If you use
lhs_restrict
to restrict some items to the left-hand-side of association rules, you can set this parameter to True to restrict the complement items to the right-hand side.Defaults to False.
- property fit_hdbprocedure
Returns the generated hdbprocedure for fit.
- property predict_hdbprocedure
Returns the generated hdbprocedure for predict.