R: sequential pattern mining (SPM)

hanaml.SPM {hana.ml.r}

R Documentation

sequential pattern mining (SPM)

Description

The sequential pattern mining (SPM) algorithm, which searches for frequent patterns in sequence databases.

Usage

hanaml.SPM(conn.context,
              data = NULL,
              used.cols = NULL,
              relational = NULL,
              min.support,
              ubiquitous = NULL,
              min.event.size = NULL,
              max.event.size = NULL,
              min.event.length = NULL,
              max.event.length = NULL,
              item.restrict = NULL,
              min.gap = NULL,
              calculate.lift = NULL,
              timeout = NULL)

Arguments

`conn.context`	`ConnectionContext` Database connection object.
`data`	`DataFrame` Dataset used for association rule mininig.
`used.cols`	`list of characters, optional` Specified the columns in `data` that specify customer IDs, transaction IDs and item IDs. For example, considering that the customer ID column for `data` is "CUSTID", the transaction ID column of `data` is "TRANSID", while the item ID colum for `data` is "ITEMS", then the correct way to set up this parameter is used.cols = list("customer" = "CUSTID", "transaction" = "TRANSID", "item" = "ITEMS"). If not set, customer ID column defaults to the 1st column of `data`, transaction ID column defaults to the 2nd column of `data`, and item ID column defauts to the 3rd column of `data`.
`relational`	`logical, optional` Specifies whether or not to apply relational logic to the output of SPM algorithm. This only affects the view of the sequential mining result. Defaults to FALSE.
`min.support`	`double` User-specified minimum support value for rule generation.
`ubiquitous`	`double, optional` User-specified maximum support value during the frequent items mining phase, i.e. if an item has support value above ubiquitous, it shall be ignored. Defaults to 1.0.
`min.event.size`	`integer, optional` User-specified minimum number of items in an event. Defaults to 1.
`max.event.size`	`integer, optional` User-specified maximum number of items in an event. Defaults to 10.
`min.event.length`	`integer, optional` User-specified minimum length of events in the output. Defaults to 1.
`max.event.length`	`integer, optional` User-specified maximum length of events in the output. Defaults to 10.
`item.restrict`	`list of strings/integers, optional` Specifies which items are allowed in the association rule. No default value.
`min.gap`	`integer, optional` Specifies The minimum time difference between consecutive events of a sequence. If the data type of the input transaction ID is timestamp, the unit of this parameter is second. No default value.
`calculate.lift`	`logical, optional` - FALSE: Only calculates lift values for the cases that the last event contains only one item. - TRUE: Calculates lift values for all applicable cases. This will take extra time. Defaults to FALSE.
`timeout`	`integer, optional` Specifies the maximum run time in seconds for association rule mining. The algorithm will stop running when the specified timeout is reached. Defautls to 3600.

Format

R6Class object.

Value

An "SPM" object with the following attributes:

result: DataFrame
Mined frequent patterns with transaction IDs, item IDs as well as support, confindence and lift values in all.
Available only when relational is FALSE.
pattern: DataFrame
Mined frequent patterns with transaction IDs and item IDs. Available only when relational is TRUE.
statistics: DataFrame
Support/confidence/lift values of mined frequent patterns. Available only when relational is TRUE.

Examples

## Not run: 
Input transaction data:

> df
   CUSTID TRANSID     ITEMS
1       A       1     Apple
2       A       1 Blueberry
3       A       2     Apple
4       A       2    Cherry
5       A       3   Dessert
6       B       1    Cherry
7       B       1 Blueberry
8       B       1     Apple
9       B       2   Dessert
10      B       3 Blueberry
11      C       1     Apple
12      C       2 Blueberry
13      C       3   Dessert

Creating an SPM object for mining association rules from the input data:

> sp <- hanaml.SPM(conn.context = conn, data = df, relational = TRUE,
                   used.cols = c("customer" = "CUSTID",
                                 "transaction" = "TRANSID",
                                 "item" = "ITEMS"),
                   min.support = 0.5, calculate.lift = TRUE)

Check the mined frequent patterns from the attributes of above SPM object:

> sp$pattern
   PATTERN_ID EVENT_ID              ITEM
1           1        1           {Apple}
2           2        1           {Apple}
3           2        2       {Blueberry}
4           3        1           {Apple}
5           3        2         {Dessert}
6           4        1 {Apple,Blueberry}
7           5        1 {Apple,Blueberry}
8           5        2         {Dessert}
9           6        1    {Apple,Cherry}
10          7        1    {Apple,Cherry}
11          7        2         {Dessert}
12          8        1       {Blueberry}
13          9        1       {Blueberry}
14          9        2         {Dessert}
15         10        1          {Cherry}
16         11        1          {Cherry}
17         11        2         {Dessert}
18         12        1         {Dessert}

> sp$statistics
   PATTERN_ID   SUPPORT CONFIDENCE      LIFT
1           1 1.0000000  0.0000000 0.0000000
2           2 0.6666667  0.6666667 0.6666667
3           3 1.0000000  1.0000000 1.0000000
4           4 0.6666667  0.0000000 0.0000000
5           5 0.6666667  1.0000000 1.0000000
6           6 0.6666667  0.0000000 0.0000000
7           7 0.6666667  1.0000000 1.0000000
8           8 1.0000000  0.0000000 0.0000000
9           9 1.0000000  1.0000000 1.0000000
10         10 0.6666667  0.0000000 0.0000000
11         11 0.6666667  1.0000000 1.0000000
12         12 1.0000000  0.0000000 0.0000000

## End(Not run)

[Package hana.ml.r version 1.0.8 Index]