SPM

class hana_ml.algorithms.pal.association.SPM(min_support, relational=None, max_len=None, min_len=None, max_len_out=None, min_len_out=None, ubiquitous=None, calc_lift=None, timeout=None)

The sequential pattern mining algorithm searches for frequent patterns in sequence databases.

Parameters

min_supportfloat

User-specified minimum support value.

relationalbool, optional

Whether or not to apply relational logic in sequential pattern mining.

If False, a single results table for frequent pattern mining is produced, otherwise the results table is splitted into two tables : one for mined patterns, and the other for statistics.

Defaults to False.

ubiquitousfloat, optional

Items whose support values are above this specified value will be ignored during the frequent item mining phase.

Defaults to 1.0.

min_lenint, optional

Minimum number of items in a transaction.

Defaults to 1.

max_lenint, optional

Maximum number of items in a transaction.

Defaults to 10.

min_len_outint, optional

Specifies the minimum number of items of the mined association rules in the result table.

Defaults to 1.

max_len_outint, optional

Specifies the maximum number of items of the mined association rules in the result table.

Defaults to 10.

calc_liftbool, optional

Whether or not toe calculate lift values for all applicable cases.

If False, lift values are only calculated for the cases where the last transaction contains a single item.

Defaults to False.

timeoutint, optional

Specifies the maximum run time in seconds.

The algorithm will stop running when the specified timeout is reached.

Defaults to 3600.

Examples

Firstly take a look at the input data df:

>>> df.collect()
   CUSTID  TRANSID      ITEMS
     A        1      Apple
     A        1  Blueberry
     A        2      Apple
     A        2     Cherry
     A        3    Dessert
     B        1     Cherry
     B        1  Blueberry
     B        1      Apple
     B        2    Dessert
     B        3  Blueberry
    C        1      Apple
    C        2  Blueberry
    C        3    Dessert

Set up a SPM instance:

>>> sp = SPM(min_support=0.5,
             relational=False,
             ubiquitous=1.0,
             max_len=10,
             min_len=1,
             calc_lift=True)

Start sequential pattern mining process from the input data, and check the results:

>>> sp.fit(data=df, customer='CUSTID', transaction='TRANSID', item='ITEMS')
>>> sp.result_.collect()
                        PATTERN   SUPPORT  CONFIDENCE      LIFT
                     {Apple}  1.000000    0.000000  0.000000
         {Apple},{Blueberry}  0.666667    0.666667  0.666667
           {Apple},{Dessert}  1.000000    1.000000  1.000000
           {Apple,Blueberry}  0.666667    0.000000  0.000000
 {Apple,Blueberry},{Dessert}  0.666667    1.000000  1.000000
              {Apple,Cherry}  0.666667    0.000000  0.000000
    {Apple,Cherry},{Dessert}  0.666667    1.000000  1.000000
                 {Blueberry}  1.000000    0.000000  0.000000
       {Blueberry},{Dessert}  1.000000    1.000000  1.000000
                    {Cherry}  0.666667    0.000000  0.000000
         {Cherry},{Dessert}  0.666667    1.000000  1.000000
                  {Dessert}  1.000000    0.000000  0.000000

Attributes

result_DataFrame

The overall frequent pattern mining result, structured as follows:

1st column : mined frequent patterns,

2nd column : support values,

3rd column : confidence values,

4th column : lift values.

Available only when relational is False.

pattern_DataFrame

Result for mined frequent patterns, structured as follows:

1st column : pattern ID,
2nd column : transaction ID,
3rd column : items.

Available only when relational is True.

stats_DataFrame

Statistics for frequent pattern mining, structured as follows:

1st column : pattern ID,
2nd column : support values,
3rd column : confidence values,
4th column : lift values.

Available only when relational is True.

Methods

fit(data[, customer, transaction, item, ...])

Sequential pattern mining from input data.

property fit_hdbprocedure: Returns the generated hdbprocedure for fit.

property predict_hdbprocedure: Returns the generated hdbprocedure for predict.

fit(data, customer=None, transaction=None, item=None, item_restrict=None, min_gap=None)

Sequential pattern mining from input data.

Parameters

dataDataFrame

Input data for sequential pattern mining.

customerstr, optional

Column name of customer ID in the input data.

Defaults to name of the 1st column if not provided.

transactionstr, optional

Column name of transaction ID in the input data.

Specially for sequential pattern mining, values of this column must reflect the sequence of occurrence as well.

Defaults to name of the 1st non-customer column if not provided.

itemstr, optional

Column name of item ID (or items) in the input data.

Defaults to the name of the last non-customer, non-transaction column if not provided.

item_restrictlist of int or str, optional

Specifies the list of items allowed in the mined association rule.

min_gapint, optional

Specifies the the minimum time difference between consecutive transactions in a sequence.

Inherited Methods from PALBase

Besides those methods mentioned above, the SPM class also inherits methods from PALBase class, please refer to PAL Base for more details.