KORD
- class hana_ml.algorithms.pal.association.KORD(k=None, measure=None, min_support=None, min_confidence=None, min_coverage=None, min_measure=None, max_antec=None, epsilon=None, use_epsilon=None, max_conseq=None)
The K-Optimal Rule Discovery (KORD) algorithm is a machine learning tool used for generating top-K association rules based on a user-defined measure. Unlike traditional association rule mining, which requires the discovery of frequent itemsets before creating rules, KORD directy identifies optimal rules. This algorithm is helpful in tasks like market basket analysis and recommendation systems where discovering associations between items is critical.
- Parameters:
- kint, optional
Specifies the number of top-k highest priority rules to discover.
Defaults to 10.
- measure{'leverage', 'lift', 'coverage', 'confidence'}, optional
Defines the priority measure for the association rules.
Defaults to 'leverage'.
- min_supportfloat, optional
Minimum support value of an association rule, within [0, 1] range.
Defaults to 0.
- min_confidencefloat, optional
Minimum confidence value of an association rule, within [0, 1] range.
Defaults to 0.
- min_coveragefloat, optional
Minimum coverage value of an association rule, within [0, 1] range.
Defaults to the value of
min_support
if not provided.- min_measurefloat, optional
Minimum measure value (either leverage or lift, depending on the
measure
setting).Defaults to 0.
- max_antecint, optional
Maximum number of antecedent items in generated rules.
Defaults to 4.
- epsilonfloat, optional
Epsilon value used for penalizing the length of rules.
This parameter is valid only when
use_epsilon
is True.Defaults to 0.0.
- use_epsilonbool, optional
Dictates if the length of rules should be penalized using
epsilon
.Defaults to False.
- max_conseqint, optional
Maximum number of consequent items in generated rules. Should not exceed 3.
Defaults to 1.
Examples
Input DataFrame df:
>>> df.head(10).collect() CUSTOMER ITEM 0 2 item2 1 2 item3 ... 8 5 item3 9 6 item1
Initialize a KORD object:
>>> krd = KORD(k=5, measure='lift', min_support=0.1, min_confidence=0.2, epsilon=0.1, use_epsilon=False)
Perform the fit() and obtain the result:
>>> krd.fit(data=df, transaction='CUSTOMER', item='ITEM') >>> krd.antec_.collect() RULE_ID ANTECEDENT_RULE 0 0 item2 1 1 item1 2 2 item2 3 2 item1 4 3 item5 5 4 item2 >>> krd.conseq_.collect() RULE_ID CONSEQUENT_RULE 0 0 item5 1 1 item5 2 2 item5 3 3 item1 4 4 item4 >>> krd.stats_.collect() RULE_ID SUPPORT CONFIDENCE LIFT LEVERAGE MEASURE 0 0 0.222222 0.285714 1.285714 0.049383 1.285714 1 1 0.222222 0.333333 1.500000 0.074074 1.500000 2 2 0.222222 0.500000 2.250000 0.123457 2.250000 3 3 0.222222 1.000000 1.500000 0.074074 1.500000 4 4 0.222222 0.285714 1.285714 0.049383 1.285714
- Attributes:
- antec_DataFrame
Info of antecedent items for the mined association rules, structured as follows:
1st column : rule ID,
2nd column : antecedent items.
- conseq_DataFrame
Info of consequent items for the mined association rules, structured as follows:
1st column : rule ID,
2nd column : consequent items.
- stats_DataFrame
- Some basic statistics for the mined association rules, structured as follows:
1st column : rule ID,
2nd column : support value of rules,
3rd column : confidence value of rules,
4th column : lift value of rules,
5th column : leverage value of rules,
6th column : measure value of rules.
Methods
fit
(data[, transaction, item])Association rule mining on the given data.
- fit(data, transaction=None, item=None)
Association rule mining on the given data.
- Parameters:
- dataDataFrame
The input data.
- transactionstr, optional
Column name of transaction ID in the input data.
Defaults to name of the 1st column if not provided.
- itemstr, optional
Column name of item ID (or items) in the input data.
Defaults to the name of the last non-transaction column if not provided.
Inherited Methods from PALBase
Besides those methods mentioned above, the KORD class also inherits methods from PALBase class, please refer to PAL Base for more details.