KORD
- class hana_ml.algorithms.pal.association.KORD(k=None, measure=None, min_support=None, min_confidence=None, min_coverage=None, min_measure=None, max_antec=None, epsilon=None, use_epsilon=None, max_conseq=None)
K-optimal rule discovery (KORD) follows the idea of generating association rules with respect to a well-defined measure, instead of first finding all frequent itemsets and then generating all possible rules.
- Parameters:
- kint, optional
The number of top rules to discover.
- measure{'leverage', 'lift', 'coverage', 'confidence'}, optional
Specifies the measure used to define the priority of the association rules.
Defaults to 'leverage'.
- min_supportfloat, optional
User-specified minimum support value of association rule, with valid range [0, 1].
Defaults to 0 if not provided.
- min_confidencefloat, optinal
User-specified minimum confidence value of association rule, with valid range [0, 1].
Defaults to 0 if not provided.
- min_converagefloat, optional
User-specified minimum coverage value of association rule, with valid range [0, 1].
Defaults to the value of
min_support
if not provided.- min_measurefloat, optional
User-specified minimum measure value (for leverage or lift, which type depends on the setting of
measure
).Defaults to 0 if not provided.
- max_antecint, optional
Specifies the maximum number of antecedent items in generated association rules.
Defaults to 4.
- epsilonfloat, optional
User-specified epsilon value for punishing length of rules.
Valid only when
use_epsilon
is True.- use_epsilonbool, optional
Specifies whether or not to use
epsilon
to punish the length of rules.Defaults to False.
- max_conseqint, optional
Specifies the maximum number of consequent items in generated association rules.
Should not be greater than 3.
Defaults to 1.
Examples
First let us have a look at the training data:
>>> df.head(10).collect() CUSTOMER ITEM 0 2 item2 1 2 item3 2 3 item1 3 3 item2 4 3 item4 5 4 item1 6 4 item3 7 5 item2 8 5 item3 9 6 item1
Set up a KORD instance:
>>> krd = KORD(k=5, measure='lift', min_support=0.1, min_confidence=0.2, epsilon=0.1, use_epsilon=False)
Start k-optimal rule discovery process from the input transaction data, and check the results:
>>> krd.fit(data=df, transaction='CUSTOMER', item='ITEM') >>> krd.antec_.collect() RULE_ID ANTECEDENT_RULE 0 0 item2 1 1 item1 2 2 item2 3 2 item1 4 3 item5 5 4 item2 >>> krd.conseq_.collect() RULE_ID CONSEQUENT_RULE 0 0 item5 1 1 item5 2 2 item5 3 3 item1 4 4 item4 >>> krd.stats_.collect() RULE_ID SUPPORT CONFIDENCE LIFT LEVERAGE MEASURE 0 0 0.222222 0.285714 1.285714 0.049383 1.285714 1 1 0.222222 0.333333 1.500000 0.074074 1.500000 2 2 0.222222 0.500000 2.250000 0.123457 2.250000 3 3 0.222222 1.000000 1.500000 0.074074 1.500000 4 4 0.222222 0.285714 1.285714 0.049383 1.285714
- Attributes:
- antec_DataFrame
Info of antecedent items for the mined association rules, structured as follows:
1st column : rule ID,
2nd column : antecedent items.
- conseq_DataFrame
Info of consequent items for the mined association rules, structured as follows:
1st column : rule ID,
2nd column : consequent items.
- stats_DataFrame
- Some basic statistics for the mined association rules, structured as follows:
1st column : rule ID,
2nd column : support value of rules,
3rd column : confidence value of rules,
4th column : lift value of rules,
5th column : leverage value of rules,
6th column : measure value of rules.
Methods
fit
(data[, transaction, item])K-optimal rule discovery from input data, based on some user-specified measure.
- fit(data, transaction=None, item=None)
K-optimal rule discovery from input data, based on some user-specified measure.
- Parameters:
- dataDataFrame
Input data for k-optimal(association) rule discovery.
- transactionstr, optional
Column name of transaction ID in the input data.
Defaults to name of the 1st column if not provided.
- itemstr, optional
Column name of item ID (or items) in the input data.
Defaults to the name of the last non-transaction column if not provided.
- property fit_hdbprocedure
Returns the generated hdbprocedure for fit.
- property predict_hdbprocedure
Returns the generated hdbprocedure for predict.
Inherited Methods from PALBase
Besides those methods mentioned above, the KORD class also inherits methods from PALBase class, please refer to PAL Base for more details.