hanaml.SPM {hana.ml.r}R Documentation

sequential pattern mining (SPM)

Description

The sequential pattern mining (SPM) algorithm, which searches for frequent patterns in sequence databases.

Usage

hanaml.SPM(conn.context,
              data = NULL,
              used.cols = NULL,
              relational = NULL,
              min.support,
              ubiquitous = NULL,
              min.event.size = NULL,
              max.event.size = NULL,
              min.event.length = NULL,
              max.event.length = NULL,
              item.restrict = NULL,
              min.gap = NULL,
              calculate.lift = NULL,
              timeout = NULL)

Arguments

conn.context

ConnectionContext
Database connection object.

data

DataFrame Dataset used for association rule mininig.

used.cols

list of characters, optional
Specified the columns in data that specify customer IDs, transaction IDs and item IDs. For example, considering that the customer ID column for data is "CUSTID", the transaction ID column of data is "TRANSID", while the item ID colum for data is "ITEMS", then the correct way to set up this parameter is

used.cols = list("customer" = "CUSTID", "transaction" = "TRANSID", "item" = "ITEMS"). If not set, customer ID column defaults to the 1st column of data, transaction ID column defaults to the 2nd column of data, and item ID column defauts to the 3rd column of data.

relational

logical, optional
Specifies whether or not to apply relational logic to the output of SPM algorithm. This only affects the view of the sequential mining result. Defaults to FALSE.

min.support

double
User-specified minimum support value for rule generation.

ubiquitous

double, optional
User-specified maximum support value during the frequent items mining phase, i.e. if an item has support value above ubiquitous, it shall be ignored. Defaults to 1.0.

min.event.size

integer, optional
User-specified minimum number of items in an event. Defaults to 1.

max.event.size

integer, optional
User-specified maximum number of items in an event. Defaults to 10.

min.event.length

integer, optional
User-specified minimum length of events in the output. Defaults to 1.

max.event.length

integer, optional
User-specified maximum length of events in the output. Defaults to 10.

item.restrict

list of strings/integers, optional
Specifies which items are allowed in the association rule.
No default value.

min.gap

integer, optional
Specifies The minimum time difference between consecutive events of a sequence.
If the data type of the input transaction ID is timestamp, the unit of this parameter is second.
No default value.

calculate.lift

logical, optional
- FALSE: Only calculates lift values for the cases that the last event contains only one item.
- TRUE: Calculates lift values for all applicable cases. This will take extra time.
Defaults to FALSE.

timeout

integer, optional
Specifies the maximum run time in seconds for association rule mining. The algorithm will stop running when the specified timeout is reached. Defautls to 3600.

Format

R6Class object.

Value

An "SPM" object with the following attributes:

Examples

## Not run: 
Input transaction data:

> df
   CUSTID TRANSID     ITEMS
1       A       1     Apple
2       A       1 Blueberry
3       A       2     Apple
4       A       2    Cherry
5       A       3   Dessert
6       B       1    Cherry
7       B       1 Blueberry
8       B       1     Apple
9       B       2   Dessert
10      B       3 Blueberry
11      C       1     Apple
12      C       2 Blueberry
13      C       3   Dessert

Creating an SPM object for mining association rules from the input data:

> sp <- hanaml.SPM(conn.context = conn, data = df, relational = TRUE,
                   used.cols = c("customer" = "CUSTID",
                                 "transaction" = "TRANSID",
                                 "item" = "ITEMS"),
                   min.support = 0.5, calculate.lift = TRUE)

Check the mined frequent patterns from the attributes of above SPM object:

> sp$pattern
   PATTERN_ID EVENT_ID              ITEM
1           1        1           {Apple}
2           2        1           {Apple}
3           2        2       {Blueberry}
4           3        1           {Apple}
5           3        2         {Dessert}
6           4        1 {Apple,Blueberry}
7           5        1 {Apple,Blueberry}
8           5        2         {Dessert}
9           6        1    {Apple,Cherry}
10          7        1    {Apple,Cherry}
11          7        2         {Dessert}
12          8        1       {Blueberry}
13          9        1       {Blueberry}
14          9        2         {Dessert}
15         10        1          {Cherry}
16         11        1          {Cherry}
17         11        2         {Dessert}
18         12        1         {Dessert}

> sp$statistics
   PATTERN_ID   SUPPORT CONFIDENCE      LIFT
1           1 1.0000000  0.0000000 0.0000000
2           2 0.6666667  0.6666667 0.6666667
3           3 1.0000000  1.0000000 1.0000000
4           4 0.6666667  0.0000000 0.0000000
5           5 0.6666667  1.0000000 1.0000000
6           6 0.6666667  0.0000000 0.0000000
7           7 0.6666667  1.0000000 1.0000000
8           8 1.0000000  0.0000000 0.0000000
9           9 1.0000000  1.0000000 1.0000000
10         10 0.6666667  0.0000000 0.0000000
11         11 0.6666667  1.0000000 1.0000000
12         12 1.0000000  0.0000000 0.0000000

## End(Not run)

[Package hana.ml.r version 1.0.8 Index]