hanaml.AprioriLite.Rd
Lite Apriori is a R wrapper for SAP HANA PAL LITE_APRIORI.
hanaml.AprioriLite(
data,
used.cols = NULL,
min.support,
min.confidence,
thread.ratio = NULL,
subsample = NULL,
recalculate = NULL,
timeout = NULL,
pmml.export = NULL
)
DataFrame
DataFrame containting the data.
list of characters, optional
Specified the columns in data
that specify transaction IDs and item IDs.
For example, consider that the transaction ID column for data
is "CUSTOMER",
while the item ID column for data
is "ITEM", then the correct way to set up
this parameter is
used.cols = list("transaction" = "CUSTOMER", "item" = "ITEM")
Transaction ID column defaults to the 1st column of data
, while item ID column
defaults to the 2nd column of data
.
numeric
User-specified minimum support value for rule generation.
numeric
User-specified minimum confidence value for rule generation.
double, optional
Controls the proportion of available threads that can be used by this
function.
The value range is from 0 to 1, where 0 indicates a single thread,
and 1 indicates all available threads.
Values between 0 and 1 will use up to
that percentage of available threads.Values outside this
range are ignored.
Defaults to 0.
double, optional
User specified subsampling rate of data
used for rule mining, ranging from 0 to 1.
Set to 1 if you want to used the entire input data.
Defaults to 1.
logical, optional
If subsampled, this parameter controls whether or not to use the remaining data to
update the computed support, confidence and lift values.
Valid only when subsample
is not 1.
Defaults TRUE.
integer, optional
Specifies the maximum run time in seconds for association rule mining.
The algorithm will stop running when the specified timeout is reached.
Defaults to 3600.
c("no", "single-row", "multi-row"), optional
Controls whether to output a PMML representation of the model,
and how to format the PMML.
"no":
No PMML model.
"single-row":
Exports a PMML model in a maximum of
one row. Fails if the model doesn't fit in one row.
"multi-row":
Exports a PMML model, splitting it
across multiple rows if it doesn't fit in one.
Default to "no".
An "AprioriLite" object with the following attributes:
result: DataFrame
Mined association rules as a whole.
Each rule has its antecedent/consequent items and support/confidence/lift values.
Input DataFrame data:
> data$Collect()
CUSTOMER ITEM
1 2 item2
2 2 item3
3 3 item1
4 3 item2
5 3 item4
6 4 item1
7 4 item3
8 5 item2
9 5 item3
10 6 item1
11 6 item3
12 0 item1
13 0 item2
14 0 item5
15 1 item2
16 1 item4
17 7 item1
18 7 item2
19 7 item3
20 7 item5
21 8 item1
22 8 item2
23 8 item3
Call the function:
> apl <- hanaml.AprioriLite(data = data,
used.cols = c(transaction = "CUSTOMER",
item = "ITEM"),
min.support = 0.1, min.confidence = 0.3,
pmml.export = "single-row")
Output:
> apl$result$Collect()
ANTECEDENT CONSEQUENT SUPPORT CONFIDENCE LIFT
1 item5 item2 0.2222222 1.0000000 1.2857143
2 item1 item5 0.2222222 0.3333333 1.5000000
3 item5 item1 0.2222222 1.0000000 1.5000000
4 item5 item3 0.1111111 0.5000000 0.7500000
5 item1 item2 0.4444444 0.6666667 0.8571429
6 item2 item1 0.4444444 0.5714286 0.8571429
7 item4 item2 0.2222222 1.0000000 1.2857143
8 item3 item2 0.4444444 0.6666667 0.8571429
9 item2 item3 0.4444444 0.5714286 0.8571429
10 item4 item1 0.1111111 0.5000000 0.7500000
11 item3 item1 0.4444444 0.6666667 1.0000000
12 item1 item3 0.4444444 0.6666667 1.0000000