Sequential Pattern Mining (SPM) — hanaml.SPM • hana.ml.r

hanaml.SPM is a R wrapper for SAP HANA PAL SPM.

hanaml.SPM(
  data,
  used.cols = NULL,
  relational = NULL,
  min.support,
  ubiquitous = NULL,
  min.event.size = NULL,
  max.event.size = NULL,
  min.event.length = NULL,
  max.event.length = NULL,
  item.restrict = NULL,
  min.gap = NULL,
  calculate.lift = NULL,
  timeout = NULL
)

Arguments

data	`DataFrame` DataFrame containting the data.
used.cols	`list of characters, optional` Specified the columns in data that specify customer IDs, transaction IDs and item IDs. For example, considering that the customer ID column for data is "CUSTID", the transaction ID column of data is "TRANSID", while the item ID column for data is "ITEMS", then the correct way to set up this parameter is used.cols = list("customer" = "CUSTID", "transaction" = "TRANSID", "item" = "ITEMS"). If not set, customer ID column defaults to the 1st column of data, transaction ID column defaults to the 2nd column of data, and item ID column defaults to the 3rd column of data.
relational	`logical, optional` Specifies whether or not to apply relational logic to the output of SPM algorithm. This only affects the view of the sequential mining result. Defaults to FALSE.
min.support	`double` User-specified minimum support value for rule generation.
ubiquitous	`double, optional` User-specified maximum support value during the frequent items mining phase, i.e. if an item has support value above ubiquitous, it shall be ignored. Defaults to 1.0.
min.event.size	`integer, optional` User-specified minimum number of items in an event. Defaults to 1.
max.event.size	`integer, optional` User-specified maximum number of items in an event. Defaults to 10.
min.event.length	`integer, optional` User-specified minimum length of events in the output. Defaults to 1.
max.event.length	`integer, optional` User-specified maximum length of events in the output. Defaults to 10.
item.restrict	`list of strings/integers, optional` Specifies which items are allowed in the association rule. No default value.
min.gap	`integer, optional` Specifies The minimum time difference between consecutive events of a sequence. If the data type of the input transaction ID is timestamp, the unit of this parameter is second. No default value.
calculate.lift	`logical, optional` - FALSE: Only calculates lift values for the cases that the last event contains only one item. - TRUE: Calculates lift values for all applicable cases. This will take extra time. Defaults to FALSE.
timeout	`integer, optional` Specifies the maximum run time in seconds for association rule mining. The algorithm will stop running when the specified timeout is reached. Defautls to 3600.

Value

A "SPM" object with the following attributes:

result: DataFrame
Mined frequent patterns with transaction IDs, item IDs as well as support, confindence and lift values in all.
Available only when relational is FALSE.
pattern: DataFrame
Mined frequent patterns with transaction IDs and item IDs. Available only when relational is TRUE.
statistics: DataFrame
Support/confidence/lift values of mined frequent patterns. Available only when relational is TRUE.

Details

The sequential pattern mining (SPM) algorithm, which searches for frequent patterns in sequence databases.

Examples

Input transaction DataFrame data:

> data$CollecT()
   CUSTID TRANSID     ITEMS
1       A       1     Apple
2       A       1 Blueberry
3       A       2     Apple
4       A       2    Cherry
5       A       3   Dessert
6       B       1    Cherry
7       B       1 Blueberry
8       B       1     Apple
9       B       2   Dessert
10      B       3 Blueberry
11      C       1     Apple
12      C       2 Blueberry
13      C       3   Dessert

Creating an SPM object for mining association rules from the input data:

> sp <- hanaml.SPM(data = df, relational = TRUE,
                   used.cols = c("customer" = "CUSTID",
                                 "transaction" = "TRANSID",
                                 "item" = "ITEMS"),
                   min.support = 0.5, calculate.lift = TRUE)

Check the mined frequent patterns from the attributes of above SPM object:

> sp$pattern$CollecT()
   PATTERN_ID EVENT_ID              ITEM
1           1        1           {Apple}
2           2        1           {Apple}
3           2        2       {Blueberry}
4           3        1           {Apple}
5           3        2         {Dessert}
6           4        1 {Apple,Blueberry}
7           5        1 {Apple,Blueberry}
8           5        2         {Dessert}
9           6        1    {Apple,Cherry}
10          7        1    {Apple,Cherry}
11          7        2         {Dessert}
12          8        1       {Blueberry}
13          9        1       {Blueberry}
14          9        2         {Dessert}
15         10        1          {Cherry}
16         11        1          {Cherry}
17         11        2         {Dessert}
18         12        1         {Dessert}

> sp$statistics$CollecT()
   PATTERN_ID   SUPPORT CONFIDENCE      LIFT
1           1 1.0000000  0.0000000 0.0000000
2           2 0.6666667  0.6666667 0.6666667
3           3 1.0000000  1.0000000 1.0000000
4           4 0.6666667  0.0000000 0.0000000
5           5 0.6666667  1.0000000 1.0000000
6           6 0.6666667  0.0000000 0.0000000
7           7 0.6666667  1.0000000 1.0000000
8           8 1.0000000  0.0000000 0.0000000
9           9 1.0000000  1.0000000 1.0000000
10         10 0.6666667  0.0000000 0.0000000
11         11 0.6666667  1.0000000 1.0000000
12         12 1.0000000  0.0000000 0.0000000