Sequential Pattern Mining (SPM) — hanaml.SPM • hana.ml.r

hanaml.SPM is a R wrapper for SAP HANA PAL SPM.

hanaml.SPM(
  data,
  used.cols = NULL,
  relational = NULL,
  min.support,
  ubiquitous = NULL,
  min.event.size = NULL,
  max.event.size = NULL,
  min.event.length = NULL,
  max.event.length = NULL,
  item.restrict = NULL,
  min.gap = NULL,
  calculate.lift = NULL,
  timeout = NULL
)

Arguments

data

DataFrame
DataFrame containting the data.

used.cols

list of characters, optional
Specified the columns in data that specify customer IDs, transaction IDs and item IDs. For example, considering that the customer ID column for data is "CUSTID", the transaction ID column of data is "TRANSID", while the item ID column for data is "ITEMS", then the correct way to set up this parameter is:

used.cols = list(customer = "CUSTID",
                 transaction = "TRANSID",
                 item = "ITEMS")

If not set, customer ID column defaults to the 1st column of data, transaction ID column defaults to the 2nd column of data, and item ID column defaults to the 3rd column of data.

relational

logical, optional
Specifies whether or not to apply relational logic to the output of SPM algorithm. This only affects the view of the sequential mining result.
Defaults to FALSE.

min.support

double
User-specified minimum support value for rule generation.

ubiquitous

double, optional
User-specified maximum support value during the frequent items mining phase, i.e. if an item has support value above ubiquitous, it shall be ignored.
Defaults to 1.0.

min.event.size

integer, optional
User-specified minimum number of items in an event.
Defaults to 1.

max.event.size

integer, optional
User-specified maximum number of items in an event.
Defaults to 10.

min.event.length

integer, optional
User-specified minimum length of events in the output.
Defaults to 1.

max.event.length

integer, optional
User-specified maximum length of events in the output.
Defaults to 10.

item.restrict

list of strings/integers, optional
Specifies which items are allowed in the association rule.
No default value.

min.gap

integer, optional
Specifies The minimum time difference between consecutive events of a sequence.
If the data type of the input transaction ID is timestamp, the unit of this parameter is second.
No default value.

calculate.lift

logical, optional

FALSE: Only calculates lift values for the cases that the last event contains only one item.
TRUE: Calculates lift values for all applicable cases. This will take extra time.

Defaults to FALSE.

timeout

integer, optional
Specifies the maximum run time in seconds for association rule mining.
The algorithm will stop running when the specified timeout is reached.
Defaults to 3600.

Value

An "SPM" object with the following attributes:

result: DataFrame
Mined frequent patterns with transaction IDs, item IDs as well as support, confidence and lift values in all.
Available only when relational is FALSE.
pattern: DataFrame
Mined frequent patterns with transaction IDs and item IDs. Available only when relational is TRUE.
statistics: DataFrame
Support/confidence/lift values of mined frequent patterns. Available only when relational is TRUE.

Details

The sequential pattern mining (SPM) algorithm, which searches for frequent patterns in sequence databases.

Examples

Input transaction DataFrame data:


> data$CollecT()
   CUSTID TRANSID     ITEMS
1       A       1     Apple
2       A       1 Blueberry
3       A       2     Apple
4       A       2    Cherry
5       A       3   Dessert
6       B       1    Cherry
7       B       1 Blueberry
8       B       1     Apple
9       B       2   Dessert
10      B       3 Blueberry
11      C       1     Apple
12      C       2 Blueberry
13      C       3   Dessert

Creating an SPM object for mining association rules from the input data:


> sp <- hanaml.SPM(data = df, relational = TRUE,
                   used.cols = c(customer = "CUSTID",
                                 transaction = "TRANSID",
                                 item = "ITEMS"),
                   min.support = 0.5, calculate.lift = TRUE)

Check the mined frequent patterns from the attributes of above SPM object:


> sp$pattern$CollecT()
   PATTERN_ID EVENT_ID              ITEM
1           1        1           {Apple}
2           2        1           {Apple}
3           2        2       {Blueberry}
4           3        1           {Apple}
5           3        2         {Dessert}
6           4        1 {Apple,Blueberry}
7           5        1 {Apple,Blueberry}
8           5        2         {Dessert}
9           6        1    {Apple,Cherry}
10          7        1    {Apple,Cherry}
11          7        2         {Dessert}
12          8        1       {Blueberry}
13          9        1       {Blueberry}
14          9        2         {Dessert}
15         10        1          {Cherry}
16         11        1          {Cherry}
17         11        2         {Dessert}
18         12        1         {Dessert}

> sp$statistics$CollecT()
   PATTERN_ID   SUPPORT CONFIDENCE      LIFT
1           1 1.0000000  0.0000000 0.0000000
2           2 0.6666667  0.6666667 0.6666667
3           3 1.0000000  1.0000000 1.0000000
4           4 0.6666667  0.0000000 0.0000000
5           5 0.6666667  1.0000000 1.0000000
6           6 0.6666667  0.0000000 0.0000000
7           7 0.6666667  1.0000000 1.0000000
8           8 1.0000000  0.0000000 0.0000000
9           9 1.0000000  1.0000000 1.0000000
10         10 0.6666667  0.0000000 0.0000000
11         11 0.6666667  1.0000000 1.0000000
12         12 1.0000000  0.0000000 0.0000000