AprioriLite
- class hana_ml.algorithms.pal.association.AprioriLite(min_support, min_confidence, subsample=None, recalculate=None, thread_ratio=None, timeout=None, pmml_export=None)
This function runs a lightweight version of the Apriori algorithm for association rule mining. It significantly reduces the computational overhead by only focusing on the creation and analysis of up to two-item sets, which makes it particularly useful for large datasets where traditional Apriori applications could be computationally expensive.
- Parameters:
- min_supportfloat
Specifies the minimum support as determined by the user.
- min_confidencefloat
Specifies the minimum confidence as determined by the user.
- subsamplefloat, optional
Specifies the sampling percentage for the input data. Set to 1 if you want to use the entire data. By subsampling, you can speed up computation on large datasets. Defaults to 1.
- recalculatebool, optional
If true, the illustrative statistics (support, confidence, and lift) of the resulting rule set are recalculated (updated) after the rules are found using sampled data.
Defaults to True.
- thread_ratiofloat, optional
Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.
Defaults to 0.
- timeoutint, optional
Specifies the maximum run time for the algorithm in seconds. The algorithm will cease computation if the specified timeout is exceeded.
Defaults to 3600.
- pmml_export{'no', 'single-row', 'multi-row'}, optional
Defines the method of exporting the Apriori model:
'no' : the model will not be exported,
'single-row' : the Apriori model will be exported as a single row PMML,
'multi-row' : the Apriori model will be exported as a multi-row PMML where each row contains a minimum of 5000 characters.
Defaults to 'no'.
Examples
Input DataFrame df:
>>> df.collect() CUSTOMER ITEM 0 2 item2 1 2 item3 ...... 21 8 item2 22 8 item3
Initialize a AprioriLite object:
>>> apl = AprioriLite(min_support=0.1, min_confidence=0.3, subsample=1.0, recalculate=False, timeout=3600, pmml_export='single-row')
Perform the fit() and obtain the result:
>>> apl.fit(data=df) >>> apl.result_.head(5).collect() ANTECEDENT CONSEQUENT SUPPORT CONFIDENCE LIFT 0 item5 item2 0.222222 1.000000 1.285714 1 item1 item5 0.222222 0.333333 1.500000 2 item5 item1 0.222222 1.000000 1.500000 3 item5 item3 0.111111 0.500000 0.750000 4 item1 item2 0.444444 0.666667 0.857143
- Attributes:
- result_DataFrame
- Mined association rules and related statistics, structured as follows:
1st column : antecedent(leading) items,
2nd column : consequent(dependent) items,
3rd column : support value,
4th column : confidence value,
5th column : lift value.
Non-empty only when
relational
is False.- model_DataFrame
- Apriori model trained from the input data, structured as follows:
1st column : model ID.
2nd column : model content, i.e. liteApriori model in PMML format.
Methods
fit
(data[, transaction, item])Association rule mining on the given data.
- fit(data, transaction=None, item=None)
Association rule mining on the given data.
- Parameters:
- dataDataFrame
The input data.
- transactionstr, optional
Name of the transaction column.
Defaults to the first column if not provided.
- itemstr, optional
Name of the item column.
Defaults to the last non-transaction column if not provided.
Inherited Methods from PALBase
Besides those methods mentioned above, the AprioriLite class also inherits methods from PALBase class, please refer to PAL Base for more details.