AprioriLite
- class hana_ml.algorithms.pal.association.AprioriLite(min_support, min_confidence, subsample=None, recalculate=None, thread_ratio=None, timeout=None, pmml_export=None)
A light version of Apriori algorithm for association rule mining, where only two large item sets are calculated.
- Parameters
- min_supportfloat
User-specified minimum support(actual value).
- min_confidencefloat
User-specified minimum confidence(actual value).
- subsamplefloat, optional
Specify the sampling percentage for the input data. Set to 1 if you want to use the entire data.
- recalculatebool, optional
If you sample the input data, this parameter indicates whether or not to use the remaining data to update the related statistics, i.e. support, confidence and lift.
Defaults to True.
- thread_ratiofloat, optional
Specifies the ratio of total number of threads that can be used by this function.
The value range is from 0 to 1, where 0 means only using 1 thread, and 1 means using at most all the currently available threads.
Values outside the range will be ignored and this function heuristically determines the number of threads to use.
Defaults to 0.
- timeoutint, optional
Specifies the maximum run time in seconds.
The algorithm will stop running when the specified timeout is reached.
Defaults to 3600.
- pmml_export{'no', 'single-row', 'multi-row'}, optional
Specify the way to export the Apriori model:
'no' : do not export the model,
'single-row' : export Apriori model in PMML in single row,
'multi-row' : export Apriori model in PMML in multiple rows, while the minimum length of each row is 5000 characters.
Defaults to 'no'.
Examples
Input data for association rule mining using Apriori algorithm:
>>> df.collect() CUSTOMER ITEM 0 2 item2 1 2 item3 2 3 item1 3 3 item2 4 3 item4 5 4 item1 6 4 item3 7 5 item2 8 5 item3 9 6 item1 10 6 item3 11 0 item1 12 0 item2 13 0 item5 14 1 item2 15 1 item4 16 7 item1 17 7 item2 18 7 item3 19 7 item5 20 8 item1 21 8 item2 22 8 item3
Set up parameters for light Apriori algorithm, ingest the input data, and check the result table:
>>> apl = AprioriLite(min_support=0.1, min_confidence=0.3, subsample=1.0, recalculate=False, timeout=3600, pmml_export='single-row') >>> apl.fit(data=df) >>> apl.result_.head(5).collect() ANTECEDENT CONSEQUENT SUPPORT CONFIDENCE LIFT 0 item5 item2 0.222222 1.000000 1.285714 1 item1 item5 0.222222 0.333333 1.500000 2 item5 item1 0.222222 1.000000 1.500000 3 item5 item3 0.111111 0.500000 0.750000 4 item1 item2 0.444444 0.666667 0.857143
- Attributes
- result_DataFrame
- Mined association rules and related statistics, structured as follows:
1st column : antecedent(leading) items,
2nd column : consequent(dependent) items,
3rd column : support value,
4th column : confidence value,
5th column : lift value.
Non-empty only when
relational
is False.- model_DataFrame
- Apriori model trained from the input data, structured as follows:
1st column : model ID.
2nd column : model content, i.e. liteApriori model in PMML format.
Methods
fit
(data[, transaction, item])Association rule mining based from the input data.
- fit(data, transaction=None, item=None)
Association rule mining based from the input data.
- Parameters
- dataDataFrame
Input data for association rule mining.
- transactionstr, optional
Name of the transaction column.
Defaults to the first column if not provided.
- itemstr, optional
Name of the item column.
Defaults to the last non-transaction column if not provided.
- property fit_hdbprocedure
Returns the generated hdbprocedure for fit.
- property predict_hdbprocedure
Returns the generated hdbprocedure for predict.