KORD
- class hana_ml.algorithms.pal.association.KORD(k=None, measure=None, min_support=None, min_confidence=None, min_coverage=None, min_measure=None, max_antec=None, epsilon=None, use_epsilon=None, max_conseq=None)
K-optimal rule discovery (KORD) follows the idea of generating association rules with respect to a well-defined measure, instead of first finding all frequent itemsets and then generating all possible rules.
- Parameters
- kint, optional
The number of top rules to discover.
- measure{'leverage', 'lift', 'coverage', 'confidence'}, optional
Specifies the measure used to define the priority of the association rules.
Defaults to 'leverage'.
- min_supportfloat, optional
User-specified minimum support value of association rule, with valid range [0, 1].
Defaults to 0 if not provided.
- min_confidencefloat, optinal
User-specified minimum confidence value of association rule, with valid range [0, 1].
Defaults to 0 if not provided.
- min_converagefloat, optional
User-specified minimum coverage value of association rule, with valid range [0, 1].
Defaults to the value of
min_support
if not provided.- min_measurefloat, optional
User-specified minimum measure value (for leverage or lift, which type depends on the setting of
measure
).Defaults to 0 if not provided.
- max_antecint, optional
Specifies the maximum number of antecedent items in generated association rules.
Defaults to 4.
- epsilonfloat, optional
User-specified epsilon value for punishing length of rules.
Valid only when
use_epsilon
is True.- use_epsilonbool, optional
Specifies whether or not to use
epsilon
to punish the length of rules.Defaults to False.
- max_conseqint, optional
Specifies the maximum number of consequent items in generated association rules.
Should not be greater than 3.
Defaults to 1.
Examples
First let us have a look at the training data:
>>> df.head(10).collect() CUSTOMER ITEM 0 2 item2 1 2 item3 2 3 item1 3 3 item2 4 3 item4 5 4 item1 6 4 item3 7 5 item2 8 5 item3 9 6 item1
Set up a KORD instance:
>>> krd = KORD(k=5, measure='lift', min_support=0.1, min_confidence=0.2, epsilon=0.1, use_epsilon=False)
Start k-optimal rule discovery process from the input transaction data, and check the results:
>>> krd.fit(data=df, transaction='CUSTOMER', item='ITEM') >>> krd.antec_.collect() RULE_ID ANTECEDENT_RULE 0 0 item2 1 1 item1 2 2 item2 3 2 item1 4 3 item5 5 4 item2 >>> krd.conseq_.collect() RULE_ID CONSEQUENT_RULE 0 0 item5 1 1 item5 2 2 item5 3 3 item1 4 4 item4 >>> krd.stats_.collect() RULE_ID SUPPORT CONFIDENCE LIFT LEVERAGE MEASURE 0 0 0.222222 0.285714 1.285714 0.049383 1.285714 1 1 0.222222 0.333333 1.500000 0.074074 1.500000 2 2 0.222222 0.500000 2.250000 0.123457 2.250000 3 3 0.222222 1.000000 1.500000 0.074074 1.500000 4 4 0.222222 0.285714 1.285714 0.049383 1.285714
- Attributes
- antec_DataFrame
Info of antecedent items for the mined association rules, structured as follows:
1st column : rule ID,
2nd column : antecedent items.
- conseq_DataFrame
Info of consequent items for the mined association rules, structured as follows:
1st column : rule ID,
2nd column : consequent items.
- stats_DataFrame
- Some basic statistics for the mined association rules, structured as follows:
1st column : rule ID,
2nd column : support value of rules,
3rd column : confidence value of rules,
4th column : lift value of rules,
5th column : leverage value of rules,
6th column : measure value of rules.
Methods
fit
(data[, transaction, item])K-optimal rule discovery from input data, based on some user-specified measure.
- fit(data, transaction=None, item=None)
K-optimal rule discovery from input data, based on some user-specified measure.
- Parameters
- dataDataFrame
Input data for k-optimal(association) rule discovery.
- transactionstr, optional
Column name of transaction ID in the input data.
Defaults to name of the 1st column if not provided.
- itemstr, optional
Column name of item ID (or items) in the input data.
Defaults to the name of the last non-transaction column if not provided.
- property fit_hdbprocedure
Returns the generated hdbprocedure for fit.
- property predict_hdbprocedure
Returns the generated hdbprocedure for predict.