hanaml.SPM.Rd
hanaml.SPM is a R wrapper for SAP HANA PAL SPM.
hanaml.SPM(
data,
used.cols = NULL,
relational = NULL,
min.support,
ubiquitous = NULL,
min.event.size = NULL,
max.event.size = NULL,
min.event.length = NULL,
max.event.length = NULL,
item.restrict = NULL,
min.gap = NULL,
calculate.lift = NULL,
timeout = NULL
)
DataFrame
DataFrame containting the data.
list of characters, optional
Specified the columns in data
that specify customer IDs, transaction IDs and
item IDs.
For example, considering that the customer ID column for data
is "CUSTID",
the transaction ID column of data
is "TRANSID",
while the item ID column for data
is "ITEMS", then the correct way to set up
this parameter is:
used.cols = list(customer = "CUSTID",
transaction = "TRANSID",
item = "ITEMS")
If not set, customer ID column defaults to the 1st column of data
,
transaction ID column defaults to the 2nd column of data
, and item ID column
defaults to the 3rd column of data
.
logical, optional
Specifies whether or not to apply relational logic to the output of SPM algorithm.
This only affects the view of the sequential mining result.
Defaults to FALSE.
double
User-specified minimum support value for rule generation.
double, optional
User-specified maximum support value during the frequent items mining phase, i.e.
if an item has support value above ubiquitous
, it shall be ignored.
Defaults to 1.0.
integer, optional
User-specified minimum number of items in an event.
Defaults to 1.
integer, optional
User-specified maximum number of items in an event.
Defaults to 10.
integer, optional
User-specified minimum length of events in the output.
Defaults to 1.
integer, optional
User-specified maximum length of events in the output.
Defaults to 10.
list of strings/integers, optional
Specifies which items are allowed in the association rule.
No default value.
integer, optional
Specifies The minimum time difference between consecutive events of a sequence.
If the data type of the input transaction ID is timestamp, the unit of this parameter is second.
No default value.
logical, optional
FALSE: Only calculates lift values for the cases that the last event contains only one item.
TRUE: Calculates lift values for all applicable cases. This will take extra time.
Defaults to FALSE.
integer, optional
Specifies the maximum run time in seconds for association rule mining.
The algorithm will stop running when the specified timeout is reached.
Defaults to 3600.
An "SPM" object with the following attributes:
result: DataFrame
Mined frequent patterns with transaction IDs, item IDs
as well as support, confidence and lift values in all.
Available only when relational
is FALSE.
pattern: DataFrame
Mined frequent patterns with transaction IDs and item IDs.
Available only when relational
is TRUE.
statistics: DataFrame
Support/confidence/lift values of mined frequent patterns.
Available only when relational
is TRUE.
The sequential pattern mining (SPM) algorithm, which searches for frequent patterns in sequence databases.
Input transaction DataFrame data:
> data$CollecT()
CUSTID TRANSID ITEMS
1 A 1 Apple
2 A 1 Blueberry
3 A 2 Apple
4 A 2 Cherry
5 A 3 Dessert
6 B 1 Cherry
7 B 1 Blueberry
8 B 1 Apple
9 B 2 Dessert
10 B 3 Blueberry
11 C 1 Apple
12 C 2 Blueberry
13 C 3 Dessert
Creating an SPM object for mining association rules from the input data:
> sp <- hanaml.SPM(data = df, relational = TRUE,
used.cols = c(customer = "CUSTID",
transaction = "TRANSID",
item = "ITEMS"),
min.support = 0.5, calculate.lift = TRUE)
Check the mined frequent patterns from the attributes of above SPM object:
> sp$pattern$CollecT()
PATTERN_ID EVENT_ID ITEM
1 1 1 {Apple}
2 2 1 {Apple}
3 2 2 {Blueberry}
4 3 1 {Apple}
5 3 2 {Dessert}
6 4 1 {Apple,Blueberry}
7 5 1 {Apple,Blueberry}
8 5 2 {Dessert}
9 6 1 {Apple,Cherry}
10 7 1 {Apple,Cherry}
11 7 2 {Dessert}
12 8 1 {Blueberry}
13 9 1 {Blueberry}
14 9 2 {Dessert}
15 10 1 {Cherry}
16 11 1 {Cherry}
17 11 2 {Dessert}
18 12 1 {Dessert}
> sp$statistics$CollecT()
PATTERN_ID SUPPORT CONFIDENCE LIFT
1 1 1.0000000 0.0000000 0.0000000
2 2 0.6666667 0.6666667 0.6666667
3 3 1.0000000 1.0000000 1.0000000
4 4 0.6666667 0.0000000 0.0000000
5 5 0.6666667 1.0000000 1.0000000
6 6 0.6666667 0.0000000 0.0000000
7 7 0.6666667 1.0000000 1.0000000
8 8 1.0000000 0.0000000 0.0000000
9 9 1.0000000 1.0000000 1.0000000
10 10 0.6666667 0.0000000 0.0000000
11 11 0.6666667 1.0000000 1.0000000
12 12 1.0000000 0.0000000 0.0000000