Apriori
- class hana_ml.algorithms.pal.association.Apriori(min_support, min_confidence, relational=None, min_lift=None, max_conseq=None, max_len=None, ubiquitous=None, use_prefix_tree=None, lhs_restrict=None, rhs_restrict=None, lhs_complement_rhs=None, rhs_complement_lhs=None, thread_ratio=None, timeout=None, pmml_export=None)
Apriori is a classic predictive analysis algorithm for finding association rules used in association analysis.
- Parameters
- min_supportfloat
User-specified minimum support(actual value).
- min_confidencefloat
User-specified minimum confidence(actual value).
- relationalbool, optional
Whether or not to apply relational logic in Apriori algorithm. If False, a single result table is produced; otherwise, the result table shall be split into three tables: antecedent, consequent and statistics.
Defaults to False.
- min_liftfloat, optional
User-specified minimum lift.
Defaults to 0.
- max_conseqint, optional
Maximum length of consequent items.
Defaults to 100.
- max_lenint, optional
Total length of antecedent items and consequent items in the output.
Defaults to 5.
- ubiquitousfloat, optional
Item sets whose support values are greater than this number will be ignored during frequent items mining.
Defaults to 1.0.
- use_prefix_treebool, optional
Indicates whether or not to use prefix tree for saving memory.
Defaults to False.
- lhs_restrictlist of str, optional(deprecated)
Specify items that are only allowed on the left-hand-side of association rules.
- rhs_restrictlist of str, optional(deprecated)
Specify items that are only allowed on the right-hand-side of association rules.
- lhs_complement_rhsbool, optional(deprecated)
If you use
rhs_restrict
to restrict some items to the left-hand-side of the association rules, you can set this parameter to True to restrict the complement items to the left-hand-side.For example, if you have 100 items (i1, i2, ..., i100), and want to restrict i1 and i2 to the right-hand-side, and i3,i4,...,i100 to the left-hand-side, you can set the parameters similarly as follows:
...
rhs_restrict = ['i1','i2'],
lhs_complement_rhs = True,
...
Defaults to False.
- rhs_complement_lhsbool, optional(deprecated)
If you use
lhs_restrict
to restrict some items to the left-hand-side of association rules, you can set this parameter to True to restrict the complement items to the right-hand side.Defaults to False.
- thread_numberfloat, optional
Specifies the ratio of total number of threads that can be used by this function.
The value range is from 0 to 1, where 0 means only using 1 thread, and 1 means using at most all the currently available threads.
Values outside the range will be ignored and this function heuristically determines the number of threads to use.
Defaults to 0.
- timeoutint, optional
Specifies the maximum run time in seconds. The algorithm will stop running when the specified timeout is reached.
Defaults to 3600.
- pmml_export{'no', 'single-row', 'multi-row'}, optional
Specify the way to export the Apriori model:
'no' : do not export the model,
'single-row' : export Apriori model in PMML in single row,
'multi-row' : export Apriori model in PMML in multiple rows, while the minimum length of each row is 5000 characters.
Defaults to 'no'.
Examples
Input data for associate rule mining:
>>> df.collect() CUSTOMER ITEM 0 2 item2 1 2 item3 2 3 item1 3 3 item2 4 3 item4 5 4 item1 6 4 item3 7 5 item2 8 5 item3 9 6 item1 10 6 item3 11 0 item1 12 0 item2 13 0 item5 14 1 item2 15 1 item4 16 7 item1 17 7 item2 18 7 item3 19 7 item5 20 8 item1 21 8 item2 22 8 item3
Set up parameters for the Apriori algorithm:
>>> ap = Apriori(min_support=0.1, min_confidence=0.3, relational=False, min_lift=1.1, max_conseq=1, max_len=5, ubiquitous=1.0, use_prefix_tree=False, thread_ratio=0, timeout=3600, pmml_export='single-row')
Association rule mining using Apriori algorithm for the input data, and check the results:
>>> ap.fit(data=df) >>> ap.result_.head(5).collect() ANTECEDENT CONSEQUENT SUPPORT CONFIDENCE LIFT 0 item5 item2 0.222222 1.000000 1.285714 1 item1 item5 0.222222 0.333333 1.500000 2 item5 item1 0.222222 1.000000 1.500000 3 item4 item2 0.222222 1.000000 1.285714 4 item2&item1 item5 0.222222 0.500000 2.250000
Apriori algorithm set up using relational logic:
>>> apr = Apriori(min_support=0.1, min_confidence=0.3, relational=True, min_lift=1.1, max_conseq=1, max_len=5, ubiquitous=1.0, use_prefix_tree=False, thread_ratio=0, timeout=3600, pmml_export='single-row')
Again mining association rules using Apriori algorithm for the input data, and check the resulting tables:
>>> apr.antec_.head(5).collect() RULE_ID ANTECEDENTITEM 0 0 item5 1 1 item1 2 2 item5 3 3 item4 4 4 item2 >>> apr.conseq_.head(5).collect() RULE_ID CONSEQUENTITEM 0 0 item2 1 1 item5 2 2 item1 3 3 item2 4 4 item5 >>> apr.stats_.head(5).collect() RULE_ID SUPPORT CONFIDENCE LIFT 0 0 0.222222 1.000000 1.285714 1 1 0.222222 0.333333 1.500000 2 2 0.222222 1.000000 1.500000 3 3 0.222222 1.000000 1.285714 4 4 0.222222 0.500000 2.250000
- Attributes
- result_DataFrame
Mined association rules and related statistics, structured as follows:
1st column : antecedent(leading) items.
2nd column : consequent(dependent) items.
3rd column : support value.
4th column : confidence value.
5th column : lift value.
Available only when
relational
is False.- model_DataFrame
Apriori model trained from the input data, structured as follows:
1st column : model ID,
2nd column : model content, i.e. Apriori model in PMML format.
- antec_DataFrame
Antecedent items of mined association rules, structured as follows:
lst column : association rule ID,
2nd column : antecedent items of the corresponding association rule.
Available only when
relational
is True.- conseq_DataFrame
Consequent items of mined association rules, structured as follows:
1st column : association rule ID,
2nd column : consequent items of the corresponding association rule.
Available only when
relational
is True.- stats_DataFrame
Statistics of the mined association rules, structured as follows:
1st column : rule ID,
2nd column : support value of the rule,
3rd column : confidence value of the rule,
4th column : lift value of the rule.
Available only when
relational
is True.
Methods
fit
(data[, transaction, item, lhs_restrict, ...])Association rule mining from the input data using
FPGrowth
algorithm.- fit(data, transaction=None, item=None, lhs_restrict=None, rhs_restrict=None, lhs_complement_rhs=None, rhs_complement_lhs=None)
Association rule mining from the input data using
FPGrowth
algorithm.- Parameters
- dataDataFrame
Input data for association rule mining.
- transactionstr, optional
Name of the transaction column.
Defaults to the first column if not provided.
- itemstr, optional
Name of the item ID column.
Data type of item column can be INTEGER, VARCHAR or NVARCHAR.
Defaults to the last non-transaction column if not provided.
- lhs_restrictlist of int/str, optional
Specify items that are only allowed on the left-hand-side of association rules.
Elements in the list should be the same type as the item column.
- rhs_restrictlist of int/str, optional
Specify items that are only allowed on the right-hand-side of association rules.
Elements in the list should be the same type as the item column.
- lhs_complement_rhsbool, optional
If you use
rhs_restrict
to restrict some items to the left-hand-side of the association rules, you can set this parameter to True to restrict the complement items to the left-hand-side. For example, if you have 100 items (i1,i2,...,i100), and want to restrict i1 and i2 to the right-hand-side, and i3, i4,..., i100 to the left-hand-side, you can set the parameters similarly as follows:...
rhs_restrict = [i1, i2],
lhs_complement_rhs = True,
...
Defaults to False.
- rhs_complement_lhsbool, optional
If you use
lhs_restrict
to restrict some items to the left-hand-side of association rules, you can set this parameter to True to restrict the complement items to the right-hand side.Defaults to False.
- property fit_hdbprocedure
Returns the generated hdbprocedure for fit.
- property predict_hdbprocedure
Returns the generated hdbprocedure for predict.