SPM
- class hana_ml.algorithms.pal.association.SPM(min_support, relational=None, max_len=None, min_len=None, max_len_out=None, min_len_out=None, ubiquitous=None, calc_lift=None, timeout=None)
The Sequential Pattern Mining (SPM) algorithm is a method in data mining developed to determine frequent patterns that occur in sequential data. This could be employed in several applications from market basket analysis to medical data analysis. Algorithm's purpose is to identify the patterns of purchase or occurrence in a sequence of time, highlighting patterns or trends in the data that may not have been initially apparent.
- Parameters:
- min_supportfloat
Specifies the minimum support value. Any item with support less than the user-specified minimum support value is not included in the frequent item mining phase.
- relationalbool, optional
Determines if relational logic should be applied in sequential pattern mining. If set to False, a single table for frequent pattern mining results is produced. Conversely, if set to True, the results table is split into two tables: one for mined patterns, and another for statistics.
Defaults to False.
- ubiquitousfloat, optional
Defines the limit above which items are disregarded during the frequent item mining phase.
Defaults to 1.0.
- min_lenint, optional
This parameter indicates the minimum number of items that can be present in a transaction. If transactions contain less than this number, they won't be considered during the pattern mining process.
Defaults to 1.
- max_lenint, optional
This parameter indicates the maximum number of items that can be present in a transaction.
Defaults to 10.
- min_len_outint, optional
This denotes the minimum number of items to be included in the mined association rules in the result table.
Defaults to 1.
- max_len_outint, optional
Specifies the maximum number of items of the mined association rules in the result table.
Defaults to 10.
- calc_liftbool, optional
Defines whether or not to compute lift values for all appropriate cases. If set to False, lift values are only computed for cases where the last transaction entails a single item.
Defaults to False.
- timeoutint, optional
Specifies the maximum run time for the algorithm in seconds. The algorithm will cease computation if the specified timeout is exceeded.
Defaults to 3600.
Examples
Input DataFrame df:
>>> df.collect() CUSTID TRANSID ITEMS 0 A 1 Apple 1 A 1 Blueberry ... 11 C 2 Blueberry 12 C 3 Dessert
Initialize a SPM object:
>>> sp = SPM(min_support=0.5, relational=False, ubiquitous=1.0, max_len=10, min_len=1, calc_lift=True)
Perform the fit() and obtain the result:
>>> sp.fit(data=df, customer='CUSTID', transaction='TRANSID', item='ITEMS') >>> sp.result_.collect() PATTERN SUPPORT CONFIDENCE LIFT 0 {Apple} 1.000000 0.000000 0.000000 1 {Apple},{Blueberry} 0.666667 0.666667 0.666667 2 {Apple},{Dessert} 1.000000 1.000000 1.000000 ... 10 {Cherry},{Dessert} 0.666667 1.000000 1.000000 11 {Dessert} 1.000000 0.000000 0.000000
- Attributes:
- result_DataFrame
The overall frequent pattern mining result, structured as follows:
1st column : mined frequent patterns,
2nd column : support values,
3rd column : confidence values,
4th column : lift values.
Available only when
relational
is False.- pattern_DataFrame
- Result for mined frequent patterns, structured as follows:
1st column : pattern ID,
2nd column : transaction ID,
3rd column : items.
Available only when
relational
is True.- stats_DataFrame
- Statistics for frequent pattern mining, structured as follows:
1st column : pattern ID,
2nd column : support values,
3rd column : confidence values,
4th column : lift values.
Available only when
relational
is True.
Methods
fit
(data[, customer, transaction, item, ...])Association rule mining on the given data.
- fit(data, customer=None, transaction=None, item=None, item_restrict=None, min_gap=None)
Association rule mining on the given data.
- Parameters:
- dataDataFrame
The input data.
- customerstr, optional
Column name of customer ID in the input data.
Defaults to name of the 1st column if not provided.
- transactionstr, optional
Column name of transaction ID in the input data.
Specially for sequential pattern mining, values of this column must reflect the sequence of occurrence as well.
Defaults to name of the 1st non-customer column if not provided.
- itemstr, optional
Column name of item ID (or items) in the input data.
Defaults to the name of the last non-customer, non-transaction column if not provided.
- item_restrictlist of int or str, optional
Specifies the list of items allowed in the mined association rule.
No default value
- min_gapint, optional
Specifies the the minimum time difference between consecutive transactions in a sequence.
No default value.
Inherited Methods from PALBase
Besides those methods mentioned above, the SPM class also inherits methods from PALBase class, please refer to PAL Base for more details.