SPM
- class hana_ml.algorithms.pal.association.SPM(min_support, relational=None, max_len=None, min_len=None, max_len_out=None, min_len_out=None, ubiquitous=None, calc_lift=None, timeout=None)
The sequential pattern mining algorithm searches for frequent patterns in sequence databases.
- Parameters
- min_supportfloat
User-specified minimum support value.
- relationalbool, optional
Whether or not to apply relational logic in sequential pattern mining.
If False, a single results table for frequent pattern mining is produced, otherwise the results table is splitted into two tables : one for mined patterns, and the other for statistics.
Defaults to False.
- ubiquitousfloat, optional
Items whose support values are above this specified value will be ignored during the frequent item mining phase.
Defaults to 1.0.
- min_lenint, optional
Minimum number of items in a transaction.
Defaults to 1.
- max_lenint, optional
Maximum number of items in a transaction.
Defaults to 10.
- min_len_outint, optional
Specifies the minimum number of items of the mined association rules in the result table.
Defaults to 1.
- max_len_outint, optional
Specifies the maximum number of items of the mined association rules in the result table.
Defaults to 10.
- calc_liftbool, optional
Whether or not toe calculate lift values for all applicable cases.
If False, lift values are only calculated for the cases where the last transaction contains a single item.
Defaults to False.
- timeoutint, optional
Specifies the maximum run time in seconds.
The algorithm will stop running when the specified timeout is reached.
Defaults to 3600.
Examples
Firstly take a look at the input data df:
>>> df.collect() CUSTID TRANSID ITEMS 0 A 1 Apple 1 A 1 Blueberry 2 A 2 Apple 3 A 2 Cherry 4 A 3 Dessert 5 B 1 Cherry 6 B 1 Blueberry 7 B 1 Apple 8 B 2 Dessert 9 B 3 Blueberry 10 C 1 Apple 11 C 2 Blueberry 12 C 3 Dessert
Set up a SPM instance:
>>> sp = SPM(min_support=0.5, relational=False, ubiquitous=1.0, max_len=10, min_len=1, calc_lift=True)
Start sequential pattern mining process from the input data, and check the results:
>>> sp.fit(data=df, customer='CUSTID', transaction='TRANSID', item='ITEMS') >>> sp.result_.collect() PATTERN SUPPORT CONFIDENCE LIFT 0 {Apple} 1.000000 0.000000 0.000000 1 {Apple},{Blueberry} 0.666667 0.666667 0.666667 2 {Apple},{Dessert} 1.000000 1.000000 1.000000 3 {Apple,Blueberry} 0.666667 0.000000 0.000000 4 {Apple,Blueberry},{Dessert} 0.666667 1.000000 1.000000 5 {Apple,Cherry} 0.666667 0.000000 0.000000 6 {Apple,Cherry},{Dessert} 0.666667 1.000000 1.000000 7 {Blueberry} 1.000000 0.000000 0.000000 8 {Blueberry},{Dessert} 1.000000 1.000000 1.000000 9 {Cherry} 0.666667 0.000000 0.000000 10 {Cherry},{Dessert} 0.666667 1.000000 1.000000 11 {Dessert} 1.000000 0.000000 0.000000
- Attributes
- result_DataFrame
The overall frequent pattern mining result, structured as follows:
1st column : mined frequent patterns,
2nd column : support values,
3rd column : confidence values,
4th column : lift values.
Available only when
relational
is False.- pattern_DataFrame
- Result for mined frequent patterns, structured as follows:
1st column : pattern ID,
2nd column : transaction ID,
3rd column : items.
Available only when
relational
is True.- stats_DataFrame
- Statistics for frequent pattern mining, structured as follows:
1st column : pattern ID,
2nd column : support values,
3rd column : confidence values,
4th column : lift values.
Available only when
relational
is True.
Methods
fit
(data[, customer, transaction, item, ...])Sequential pattern mining from input data.
- property fit_hdbprocedure
Returns the generated hdbprocedure for fit.
- property predict_hdbprocedure
Returns the generated hdbprocedure for predict.
- fit(data, customer=None, transaction=None, item=None, item_restrict=None, min_gap=None)
Sequential pattern mining from input data.
- Parameters
- dataDataFrame
Input data for sequential pattern mining.
- customerstr, optional
Column name of customer ID in the input data.
Defaults to name of the 1st column if not provided.
- transactionstr, optional
Column name of transaction ID in the input data.
Specially for sequential pattern mining, values of this column must reflect the sequence of occurrence as well.
Defaults to name of the 1st non-customer column if not provided.
- itemstr, optional
Column name of item ID (or items) in the input data.
Defaults to the name of the last non-customer, non-transaction column if not provided.
- item_restrictlist of int or str, optional
Specifies the list of items allowed in the mined association rule.
- min_gapint, optional
Specifies the the minimum time difference between consecutive transactions in a sequence.