AprioriLite

class hana_ml.algorithms.pal.association.AprioriLite(min_support, min_confidence, subsample=None, recalculate=None, thread_ratio=None, timeout=None, pmml_export=None)

This function runs a lightweight version of the Apriori algorithm for association rule mining. It significantly reduces the computational overhead by only focusing on the creation and analysis of up to two-item sets, which makes it particularly useful for large datasets where traditional Apriori applications could be computationally expensive.

Parameters:
min_supportfloat

Specifies the minimum support as determined by the user.

min_confidencefloat

Specifies the minimum confidence as determined by the user.

subsamplefloat, optional

Specifies the sampling percentage for the input data. Set to 1 if you want to use the entire data. By subsampling, you can speed up computation on large datasets. Defaults to 1.

recalculatebool, optional

If true, the illustrative statistics (support, confidence, and lift) of the resulting rule set are recalculated (updated) after the rules are found using sampled data.

Defaults to True.

thread_ratiofloat, optional

Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.

Defaults to 0.

timeoutint, optional

Specifies the maximum run time for the algorithm in seconds. The algorithm will cease computation if the specified timeout is exceeded.

Defaults to 3600.

pmml_export{'no', 'single-row', 'multi-row'}, optional

Defines the method of exporting the Apriori model:

  • 'no' : the model will not be exported,

  • 'single-row' : the Apriori model will be exported as a single row PMML,

  • 'multi-row' : the Apriori model will be exported as a multi-row PMML where each row contains a minimum of 5000 characters.

Defaults to 'no'.

Examples

Input DataFrame df:

>>> df.collect()
    CUSTOMER   ITEM
0          2  item2
1          2  item3
......
21         8  item2
22         8  item3

Initialize a AprioriLite object:

>>> apl = AprioriLite(min_support=0.1,
                      min_confidence=0.3,
                      subsample=1.0,
                      recalculate=False,
                      timeout=3600,
                      pmml_export='single-row')

Perform the fit() and obtain the result:

>>> apl.fit(data=df)
>>> apl.result_.head(5).collect()
  ANTECEDENT CONSEQUENT   SUPPORT  CONFIDENCE      LIFT
0      item5      item2  0.222222    1.000000  1.285714
1      item1      item5  0.222222    0.333333  1.500000
2      item5      item1  0.222222    1.000000  1.500000
3      item5      item3  0.111111    0.500000  0.750000
4      item1      item2  0.444444    0.666667  0.857143
Attributes:
result_DataFrame
Mined association rules and related statistics, structured as follows:
  • 1st column : antecedent(leading) items,

  • 2nd column : consequent(dependent) items,

  • 3rd column : support value,

  • 4th column : confidence value,

  • 5th column : lift value.

Non-empty only when relational is False.

model_DataFrame
Apriori model trained from the input data, structured as follows:
  • 1st column : model ID.

  • 2nd column : model content, i.e. liteApriori model in PMML format.

Methods

fit(data[, transaction, item])

Association rule mining on the given data.

fit(data, transaction=None, item=None)

Association rule mining on the given data.

Parameters:
dataDataFrame

The input data.

transactionstr, optional

Name of the transaction column.

Defaults to the first column if not provided.

itemstr, optional

Name of the item column.

Defaults to the last non-transaction column if not provided.

Inherited Methods from PALBase

Besides those methods mentioned above, the AprioriLite class also inherits methods from PALBase class, please refer to PAL Base for more details.