AprioriLite

class hana_ml.algorithms.pal.association.AprioriLite(min_support, min_confidence, subsample=None, recalculate=None, thread_ratio=None, timeout=None, pmml_export=None)

A light version of Apriori algorithm for association rule mining, where only two large item sets are calculated.

Parameters:

min_supportfloat

User-specified minimum support(actual value).

min_confidencefloat

User-specified minimum confidence(actual value).

subsamplefloat, optional

Specify the sampling percentage for the input data. Set to 1 if you want to use the entire data.

recalculatebool, optional

If you sample the input data, this parameter indicates whether or not to use the remaining data to update the related statistics, i.e. support, confidence and lift.

Defaults to True.

thread_ratiofloat, optional

Specifies the ratio of total number of threads that can be used by this function.

The value range is from 0 to 1, where 0 means only using 1 thread, and 1 means using at most all the currently available threads.

Values outside the range will be ignored and this function heuristically determines the number of threads to use.

Defaults to 0.

timeoutint, optional

Specifies the maximum run time in seconds.

The algorithm will stop running when the specified timeout is reached.

Defaults to 3600.

pmml_export{'no', 'single-row', 'multi-row'}, optional

Specify the way to export the Apriori model:

'no' : do not export the model,

'single-row' : export Apriori model in PMML in single row,

'multi-row' : export Apriori model in PMML in multiple rows, while the minimum length of each row is 5000 characters.

Defaults to 'no'.

Examples

Input data for association rule mining using Apriori algorithm:

>>> df.collect()
    CUSTOMER   ITEM
        2  item2
        2  item3
        3  item1
        3  item2
        3  item4
        4  item1
        4  item3
        5  item2
        5  item3
        6  item1
       6  item3
       0  item1
       0  item2
       0  item5
       1  item2
       1  item4
       7  item1
       7  item2
       7  item3
       7  item5
       8  item1
       8  item2
       8  item3

Set up parameters for light Apriori algorithm, ingest the input data, and check the result table:

>>> apl = AprioriLite(min_support=0.1,
                      min_confidence=0.3,
                      subsample=1.0,
                      recalculate=False,
                      timeout=3600,
                      pmml_export='single-row')
>>> apl.fit(data=df)
>>> apl.result_.head(5).collect()
  ANTECEDENT CONSEQUENT   SUPPORT  CONFIDENCE      LIFT
0      item5      item2  0.222222    1.000000  1.285714
1      item1      item5  0.222222    0.333333  1.500000
2      item5      item1  0.222222    1.000000  1.500000
3      item5      item3  0.111111    0.500000  0.750000
4      item1      item2  0.444444    0.666667  0.857143

Attributes:

result_DataFrame

Mined association rules and related statistics, structured as follows:

1st column : antecedent(leading) items,
2nd column : consequent(dependent) items,
3rd column : support value,
4th column : confidence value,
5th column : lift value.

Non-empty only when relational is False.

model_DataFrame

Apriori model trained from the input data, structured as follows:

1st column : model ID.
2nd column : model content, i.e. liteApriori model in PMML format.

Methods

fit(data[, transaction, item])

Association rule mining based from the input data.

fit(data, transaction=None, item=None)

Association rule mining based from the input data.

Parameters:

dataDataFrame

Input data for association rule mining.

transactionstr, optional

Name of the transaction column.

Defaults to the first column if not provided.

itemstr, optional

Name of the item column.

Defaults to the last non-transaction column if not provided.

property fit_hdbprocedure: Returns the generated hdbprocedure for fit.

property predict_hdbprocedure: Returns the generated hdbprocedure for predict.

Inherited Methods from PALBase

Besides those methods mentioned above, the AprioriLite class also inherits methods from PALBase class, please refer to PAL Base for more details.