Apriori
- class hana_ml.algorithms.pal.association.Apriori(min_support, min_confidence, relational=None, min_lift=None, max_conseq=None, max_len=None, ubiquitous=None, use_prefix_tree=None, lhs_restrict=None, rhs_restrict=None, lhs_complement_rhs=None, rhs_complement_lhs=None, thread_ratio=None, timeout=None, pmml_export=None)
Apriori is a classic predictive analysis algorithm for finding association rules used in association analysis.
- Parameters
- min_supportfloat
User-specified minimum support(actual value).
- min_confidencefloat
User-specified minimum confidence(actual value).
- relationalbool, optional
Whether or not to apply relational logic in Apriori algorithm. If False, a single result table is produced; otherwise, the result table shall be split into three tables: antecedent, consequent and statistics.
Defaults to False.
- min_liftfloat, optional
User-specified minimum lift.
Defaults to 0.
- max_conseqint, optional
Maximum length of consequent items.
Defaults to 100.
- max_lenint, optional
Total length of antecedent items and consequent items in the output.
Defaults to 5.
- ubiquitousfloat, optional
Item sets whose support values are greater than this number will be ignored during frequent items mining.
Defaults to 1.0.
- use_prefix_treebool, optional
Indicates whether or not to use prefix tree for saving memory.
Defaults to False.
- lhs_restrictlist of str, optional(deprecated)
Specify items that are only allowed on the left-hand-side of association rules.
- rhs_restrictlist of str, optional(deprecated)
Specify items that are only allowed on the right-hand-side of association rules.
- lhs_complement_rhsbool, optional(deprecated)
If you use
rhs_restrict
to restrict some items to the left-hand-side of the association rules, you can set this parameter to True to restrict the complement items to the left-hand-side.For example, if you have 100 items (i1, i2, ..., i100), and want to restrict i1 and i2 to the right-hand-side, and i3,i4,...,i100 to the left-hand-side, you can set the parameters similarly as follows:
...
rhs_restrict = ['i1','i2'],
lhs_complement_rhs = True,
...
Defaults to False.
- rhs_complement_lhsbool, optional(deprecated)
If you use
lhs_restrict
to restrict some items to the left-hand-side of association rules, you can set this parameter to True to restrict the complement items to the right-hand side.Defaults to False.
- thread_numberfloat, optional
Specifies the ratio of total number of threads that can be used by this function.
The value range is from 0 to 1, where 0 means only using 1 thread, and 1 means using at most all the currently available threads.
Values outside the range will be ignored and this function heuristically determines the number of threads to use.
Defaults to 0.
- timeoutint, optional
Specifies the maximum run time in seconds. The algorithm will stop running when the specified timeout is reached.
Defaults to 3600.
- pmml_export{'no', 'single-row', 'multi-row'}, optional
Specify the way to export the Apriori model:
'no' : do not export the model,
'single-row' : export Apriori model in PMML in single row,
'multi-row' : export Apriori model in PMML in multiple rows, while the minimum length of each row is 5000 characters.
Defaults to 'no'.
Examples
Input data for associate rule mining:
>>> df.collect() CUSTOMER ITEM 0 2 item2 1 2 item3 2 3 item1 3 3 item2 4 3 item4 5 4 item1 6 4 item3 7 5 item2 8 5 item3 9 6 item1 10 6 item3 11 0 item1 12 0 item2 13 0 item5 14 1 item2 15 1 item4 16 7 item1 17 7 item2 18 7 item3 19 7 item5 20 8 item1 21 8 item2 22 8 item3
Set up parameters for the Apriori algorithm:
>>> ap = Apriori(min_support=0.1, min_confidence=0.3, relational=False, min_lift=1.1, max_conseq=1, max_len=5, ubiquitous=1.0, use_prefix_tree=False, thread_ratio=0, timeout=3600, pmml_export='single-row')
Association rule mining using Apriori algorithm for the input data, and check the results:
>>> ap.fit(data=df) >>> ap.result_.head(5).collect() ANTECEDENT CONSEQUENT SUPPORT CONFIDENCE LIFT 0 item5 item2 0.222222 1.000000 1.285714 1 item1 item5 0.222222 0.333333 1.500000 2 item5 item1 0.222222 1.000000 1.500000 3 item4 item2 0.222222 1.000000 1.285714 4 item2&item1 item5 0.222222 0.500000 2.250000
Apriori algorithm set up using relational logic:
>>> apr = Apriori(min_support=0.1, min_confidence=0.3, relational=True, min_lift=1.1, max_conseq=1, max_len=5, ubiquitous=1.0, use_prefix_tree=False, thread_ratio=0, timeout=3600, pmml_export='single-row')
Again mining association rules using Apriori algorithm for the input data, and check the resulting tables:
>>> apr.antec_.head(5).collect() RULE_ID ANTECEDENTITEM 0 0 item5 1 1 item1 2 2 item5 3 3 item4 4 4 item2 >>> apr.conseq_.head(5).collect() RULE_ID CONSEQUENTITEM 0 0 item2 1 1 item5 2 2 item1 3 3 item2 4 4 item5 >>> apr.stats_.head(5).collect() RULE_ID SUPPORT CONFIDENCE LIFT 0 0 0.222222 1.000000 1.285714 1 1 0.222222 0.333333 1.500000 2 2 0.222222 1.000000 1.500000 3 3 0.222222 1.000000 1.285714 4 4 0.222222 0.500000 2.250000
- Attributes
- result_DataFrame
Mined association rules and related statistics, structured as follows:
1st column : antecedent(leading) items.
2nd column : consequent(dependent) items.
3rd column : support value.
4th column : confidence value.
5th column : lift value.
Available only when
relational
is False.- model_DataFrame
Apriori model trained from the input data, structured as follows:
1st column : model ID,
2nd column : model content, i.e. Apriori model in PMML format.
- antec_DataFrame
Antecedent items of mined association rules, structured as follows:
1st column : association rule ID,
2nd column : antecedent items of the corresponding association rule.
Available only when
relational
is True.- conseq_DataFrame
Consequent items of mined association rules, structured as follows:
1st column : association rule ID,
2nd column : consequent items of the corresponding association rule.
Available only when
relational
is True.- stats_DataFrame
Statistics of the mined association rules, structured as follows:
1st column : rule ID,
2nd column : support value of the rule,
3rd column : confidence value of the rule,
4th column : lift value of the rule.
Available only when
relational
is True.
Methods
fit
(data[, transaction, item, lhs_restrict, ...])Association rule mining from the input data using
FPGrowth
algorithm.- fit(data, transaction=None, item=None, lhs_restrict=None, rhs_restrict=None, lhs_complement_rhs=None, rhs_complement_lhs=None)
Association rule mining from the input data using
FPGrowth
algorithm.- Parameters
- dataDataFrame
Input data for association rule mining.
- transactionstr, optional
Name of the transaction column.
Defaults to the first column if not provided.
- itemstr, optional
Name of the item ID column.
Data type of item column can be INTEGER, VARCHAR or NVARCHAR.
Defaults to the last non-transaction column if not provided.
- lhs_restrictlist of int/str, optional
Specify items that are only allowed on the left-hand-side of association rules.
Elements in the list should be the same type as the item column.
- rhs_restrictlist of int/str, optional
Specify items that are only allowed on the right-hand-side of association rules.
Elements in the list should be the same type as the item column.
- lhs_complement_rhsbool, optional
If you use
rhs_restrict
to restrict some items to the left-hand-side of the association rules, you can set this parameter to True to restrict the complement items to the left-hand-side. For example, if you have 100 items (i1,i2,...,i100), and want to restrict i1 and i2 to the right-hand-side, and i3, i4,..., i100 to the left-hand-side, you can set the parameters similarly as follows:...
rhs_restrict = [i1, i2],
lhs_complement_rhs = True,
...
Defaults to False.
- rhs_complement_lhsbool, optional
If you use
lhs_restrict
to restrict some items to the left-hand-side of association rules, you can set this parameter to True to restrict the complement items to the right-hand side.Defaults to False.
- property fit_hdbprocedure
Returns the generated hdbprocedure for fit.
- property predict_hdbprocedure
Returns the generated hdbprocedure for predict.
Inherited Methods from PALBase
Besides those methods mentioned above, the Apriori class also inherits methods from PALBase class, please refer to PAL Base for more details.