KORD

class hana_ml.algorithms.pal.association.KORD(k=None, measure=None, min_support=None, min_confidence=None, min_coverage=None, min_measure=None, max_antec=None, epsilon=None, use_epsilon=None, max_conseq=None)

K-optimal rule discovery (KORD) follows the idea of generating association rules with respect to a well-defined measure, instead of first finding all frequent itemsets and then generating all possible rules.

Parameters
kint, optional

The number of top rules to discover.

measure{'leverage', 'lift', 'coverage', 'confidence'}, optional

Specifies the measure used to define the priority of the association rules.

Defaults to 'leverage'.

min_supportfloat, optional

User-specified minimum support value of association rule, with valid range [0, 1].

Defaults to 0 if not provided.

min_confidencefloat, optinal

User-specified minimum confidence value of association rule, with valid range [0, 1].

Defaults to 0 if not provided.

min_converagefloat, optional

User-specified minimum coverage value of association rule, with valid range [0, 1].

Defaults to the value of min_support if not provided.

min_measurefloat, optional

User-specified minimum measure value (for leverage or lift, which type depends on the setting of measure).

Defaults to 0 if not provided.

max_antecint, optional

Specifies the maximum number of antecedent items in generated association rules.

Defaults to 4.

epsilonfloat, optional

User-specified epsilon value for punishing length of rules.

Valid only when use_epsilon is True.

use_epsilonbool, optional

Specifies whether or not to use epsilon to punish the length of rules.

Defaults to False.

max_conseqint, optional

Specifies the maximum number of consequent items in generated association rules.

Should not be greater than 3.

Defaults to 1.

Examples

First let us have a look at the training data:

>>> df.head(10).collect()
    CUSTOMER   ITEM
0          2  item2
1          2  item3
2          3  item1
3          3  item2
4          3  item4
5          4  item1
6          4  item3
7          5  item2
8          5  item3
9          6  item1

Set up a KORD instance:

>>> krd =  KORD(k=5,
                measure='lift',
                min_support=0.1,
                min_confidence=0.2,
                epsilon=0.1,
                use_epsilon=False)

Start k-optimal rule discovery process from the input transaction data, and check the results:

>>> krd.fit(data=df, transaction='CUSTOMER', item='ITEM')
>>> krd.antec_.collect()
   RULE_ID ANTECEDENT_RULE
0        0           item2
1        1           item1
2        2           item2
3        2           item1
4        3           item5
5        4           item2
>>> krd.conseq_.collect()
   RULE_ID CONSEQUENT_RULE
0        0           item5
1        1           item5
2        2           item5
3        3           item1
4        4           item4
>>> krd.stats_.collect()
   RULE_ID   SUPPORT  CONFIDENCE      LIFT  LEVERAGE   MEASURE
0        0  0.222222    0.285714  1.285714  0.049383  1.285714
1        1  0.222222    0.333333  1.500000  0.074074  1.500000
2        2  0.222222    0.500000  2.250000  0.123457  2.250000
3        3  0.222222    1.000000  1.500000  0.074074  1.500000
4        4  0.222222    0.285714  1.285714  0.049383  1.285714
Attributes
antec_DataFrame

Info of antecedent items for the mined association rules, structured as follows:

  • 1st column : rule ID,

  • 2nd column : antecedent items.

conseq_DataFrame

Info of consequent items for the mined association rules, structured as follows:

  • 1st column : rule ID,

  • 2nd column : consequent items.

stats_DataFrame
Some basic statistics for the mined association rules, structured as follows:
  • 1st column : rule ID,

  • 2nd column : support value of rules,

  • 3rd column : confidence value of rules,

  • 4th column : lift value of rules,

  • 5th column : leverage value of rules,

  • 6th column : measure value of rules.

Methods

fit(data[, transaction, item])

K-optimal rule discovery from input data, based on some user-specified measure.

fit(data, transaction=None, item=None)

K-optimal rule discovery from input data, based on some user-specified measure.

Parameters
dataDataFrame

Input data for k-optimal(association) rule discovery.

transactionstr, optional

Column name of transaction ID in the input data.

Defaults to name of the 1st column if not provided.

itemstr, optional

Column name of item ID (or items) in the input data.

Defaults to the name of the last non-transaction column if not provided.

property fit_hdbprocedure

Returns the generated hdbprocedure for fit.

property predict_hdbprocedure

Returns the generated hdbprocedure for predict.

Inherited Methods from PALBase

Besides those methods mentioned above, the KORD class also inherits methods from PALBase class, please refer to PAL Base for more details.