KORD

class hana_ml.algorithms.pal.association.KORD(k=None, measure=None, min_support=None, min_confidence=None, min_coverage=None, min_measure=None, max_antec=None, epsilon=None, use_epsilon=None, max_conseq=None)

The K-Optimal Rule Discovery (KORD) algorithm is a machine learning tool used for generating top-K association rules based on a user-defined measure. Unlike traditional association rule mining, which requires the discovery of frequent itemsets before creating rules, KORD directy identifies optimal rules. This algorithm is helpful in tasks like market basket analysis and recommendation systems where discovering associations between items is critical.

Parameters:

kint, optional

Specifies the number of top-k highest priority rules to discover.

Defaults to 10.

measure{'leverage', 'lift', 'coverage', 'confidence'}, optional

Defines the priority measure for the association rules.

Defaults to 'leverage'.

min_supportfloat, optional

Minimum support value of an association rule, within [0, 1] range.

Defaults to 0.

min_confidencefloat, optional

Minimum confidence value of an association rule, within [0, 1] range.

Defaults to 0.

min_coveragefloat, optional

Minimum coverage value of an association rule, within [0, 1] range.

Defaults to the value of min_support if not provided.

min_measurefloat, optional

Minimum measure value (either leverage or lift, depending on the measure setting).

Defaults to 0.

max_antecint, optional

Maximum number of antecedent items in generated rules.

Defaults to 4.

epsilonfloat, optional

Epsilon value used for penalizing the length of rules.

This parameter is valid only when use_epsilon is True.

Defaults to 0.0.

use_epsilonbool, optional

Dictates if the length of rules should be penalized using epsilon.

Defaults to False.

max_conseqint, optional

Maximum number of consequent items in generated rules. Should not exceed 3.

Defaults to 1.

Attributes:

antec_DataFrame

Info of antecedent items for the mined association rules, structured as follows:

1st column : rule ID,

2nd column : antecedent items.

conseq_DataFrame

Info of consequent items for the mined association rules, structured as follows:

1st column : rule ID,

2nd column : consequent items.

stats_DataFrame

Some basic statistics for the mined association rules, structured as follows:

1st column : rule ID,
2nd column : support value of rules,
3rd column : confidence value of rules,
4th column : lift value of rules,
5th column : leverage value of rules,
6th column : measure value of rules.

Methods

fit(data[, transaction, item])

Association rule mining on the given data.

Examples

Input DataFrame df:

>>> df.head(10).collect()
    CUSTOMER   ITEM
0          2  item2
1          2  item3
...
8          5  item3
9          6  item1

Initialize a KORD object:

>>> krd = KORD(k=5,
               measure='lift',
               min_support=0.1,
               min_confidence=0.2,
               epsilon=0.1,
               use_epsilon=False)

Perform the fit() and obtain the result:

>>> krd.fit(data=df, transaction='CUSTOMER', item='ITEM')
>>> krd.antec_.collect()
   RULE_ID ANTECEDENT_RULE
0        0           item2
1        1           item1
2        2           item2
3        2           item1
4        3           item5
5        4           item2
>>> krd.conseq_.collect()
   RULE_ID CONSEQUENT_RULE
0        0           item5
1        1           item5
2        2           item5
3        3           item1
4        4           item4
>>> krd.stats_.collect()
   RULE_ID   SUPPORT  CONFIDENCE      LIFT  LEVERAGE   MEASURE
0        0  0.222222    0.285714  1.285714  0.049383  1.285714
1        1  0.222222    0.333333  1.500000  0.074074  1.500000
2        2  0.222222    0.500000  2.250000  0.123457  2.250000
3        3  0.222222    1.000000  1.500000  0.074074  1.500000
4        4  0.222222    0.285714  1.285714  0.049383  1.285714

fit(data, transaction=None, item=None)

Association rule mining on the given data.

Parameters:

dataDataFrame

The input data.

transactionstr, optional

Column name of transaction ID in the input data.

Defaults to name of the 1st column if not provided.

itemstr, optional

Column name of item ID (or items) in the input data.

Defaults to the name of the last non-transaction column if not provided.

Inherited Methods from PALBase

Besides those methods mentioned above, the KORD class also inherits methods from PALBase class, please refer to PAL Base for more details.