SPM

class hana_ml.algorithms.pal.association.SPM(min_support, relational=None, max_len=None, min_len=None, max_len_out=None, min_len_out=None, ubiquitous=None, calc_lift=None, timeout=None)

The Sequential Pattern Mining (SPM) algorithm is a method in data mining developed to determine frequent patterns that occur in sequential data. This could be employed in several applications from market basket analysis to medical data analysis. Algorithm's purpose is to identify the patterns of purchase or occurrence in a sequence of time, highlighting patterns or trends in the data that may not have been initially apparent.

Parameters:
min_supportfloat

Specifies the minimum support value. Any item with support less than the user-specified minimum support value is not included in the frequent item mining phase.

relationalbool, optional

Determines if relational logic should be applied in sequential pattern mining. If set to False, a single table for frequent pattern mining results is produced. Conversely, if set to True, the results table is split into two tables: one for mined patterns, and another for statistics.

Defaults to False.

ubiquitousfloat, optional

Defines the limit above which items are disregarded during the frequent item mining phase.

Defaults to 1.0.

min_lenint, optional

This parameter indicates the minimum number of items that can be present in a transaction. If transactions contain less than this number, they won't be considered during the pattern mining process.

Defaults to 1.

max_lenint, optional

This parameter indicates the maximum number of items that can be present in a transaction.

Defaults to 10.

min_len_outint, optional

This denotes the minimum number of items to be included in the mined association rules in the result table.

Defaults to 1.

max_len_outint, optional

Specifies the maximum number of items of the mined association rules in the result table.

Defaults to 10.

calc_liftbool, optional

Defines whether or not to compute lift values for all appropriate cases. If set to False, lift values are only computed for cases where the last transaction entails a single item.

Defaults to False.

timeoutint, optional

Specifies the maximum run time for the algorithm in seconds. The algorithm will cease computation if the specified timeout is exceeded.

Defaults to 3600.

Examples

Input DataFrame df:

>>> df.collect()
   CUSTID  TRANSID      ITEMS
0       A        1      Apple
1       A        1  Blueberry
...
11      C        2  Blueberry
12      C        3    Dessert

Initialize a SPM object:

>>> sp = SPM(min_support=0.5,
             relational=False,
             ubiquitous=1.0,
             max_len=10,
             min_len=1,
             calc_lift=True)

Perform the fit() and obtain the result:

>>> sp.fit(data=df, customer='CUSTID', transaction='TRANSID', item='ITEMS')
>>> sp.result_.collect()
                        PATTERN   SUPPORT  CONFIDENCE      LIFT
0                       {Apple}  1.000000    0.000000  0.000000
1           {Apple},{Blueberry}  0.666667    0.666667  0.666667
2             {Apple},{Dessert}  1.000000    1.000000  1.000000
...
10           {Cherry},{Dessert}  0.666667    1.000000  1.000000
11                    {Dessert}  1.000000    0.000000  0.000000
Attributes:
result_DataFrame

The overall frequent pattern mining result, structured as follows:

  • 1st column : mined frequent patterns,

  • 2nd column : support values,

  • 3rd column : confidence values,

  • 4th column : lift values.

Available only when relational is False.

pattern_DataFrame
Result for mined frequent patterns, structured as follows:
  • 1st column : pattern ID,

  • 2nd column : transaction ID,

  • 3rd column : items.

Available only when relational is True.

stats_DataFrame
Statistics for frequent pattern mining, structured as follows:
  • 1st column : pattern ID,

  • 2nd column : support values,

  • 3rd column : confidence values,

  • 4th column : lift values.

Available only when relational is True.

Methods

fit(data[, customer, transaction, item, ...])

Association rule mining on the given data.

fit(data, customer=None, transaction=None, item=None, item_restrict=None, min_gap=None)

Association rule mining on the given data.

Parameters:
dataDataFrame

The input data.

customerstr, optional

Column name of customer ID in the input data.

Defaults to name of the 1st column if not provided.

transactionstr, optional

Column name of transaction ID in the input data.

Specially for sequential pattern mining, values of this column must reflect the sequence of occurrence as well.

Defaults to name of the 1st non-customer column if not provided.

itemstr, optional

Column name of item ID (or items) in the input data.

Defaults to the name of the last non-customer, non-transaction column if not provided.

item_restrictlist of int or str, optional

Specifies the list of items allowed in the mined association rule.

No default value

min_gapint, optional

Specifies the the minimum time difference between consecutive transactions in a sequence.

No default value.

Inherited Methods from PALBase

Besides those methods mentioned above, the SPM class also inherits methods from PALBase class, please refer to PAL Base for more details.