K-Optimal Rule Discovery (KORD)

K-optimal rule discovery (KORD) follows the idea of generating association rules with respect to a well-defined measure, instead of first finding all frequent itemsets and then generating all possible rules. The algorithm only calculates the top-k rules according to that measure. The size of the right hand side (RHS) of those rules is restricted to one. Futhermore, the KORD implementation generates only non-redundant rules.

The algorithm's search strategy is based on the so-called OPUS search. While the search space of all possible LHSs is traversed in a depth-first manner, the information about all qualified RHSs of the rules for a given LHS is propagated further to the deeper search levels. KORD does not build a real tree search structure; instead it traverses the LHSs in a specific order, which allows the pruning of the search space by simply not visiting those itemsets subsequently. In this way it is possible to use pruning rules which restrict the possible LHSs and RHSs at different rule generation stages.

Prerequisites

  • There are no duplicated items in each transaction.
  • The input data does not contain null value. The algorithm will issue errors when encountering null values.

KORD

Procedure Generation

CALL SYS.AFLLANG_WRAPPER_PROCEDURE_CREATE(‘AFLPAL’, ‘KORD’, ‘<schema_name>’, '<procedure_name>', <signature_table>);

The signature table should contain the following records:

Position Schema Name Table Type Name Parameter Type
1 <schema_name> <TRANSACTION table type> IN
2 <schema_name> <PARAMETER table type> IN
3 <schema_name> <Rules OUTPUT table type> OUT
4 <schema_name> <Antecedent table type> OUT
5 <schema_name> <Consequent table type> OUT

Procedure Calling

CALL <schema_name>.<procedure_name>(<transaction_table>, <parameter_table>, <rules_output_table>, <antecedent_output_table>, <consequent_output_table>) with overview;

The procedure name is the same as specified in the procedure generation.

The input, parameter, and output tables must be of the types specified in the signature table.

Signature

Transaction Table

Table Column Column Data Type Description
Data 1st column Integer, varchar, or nvarchar Transaction ID
2nd column Integer, varchar, or nvarchar Item ID

Parameter Table

Mandatory Parameters

None.

Optional Parameters

The following parameters are optional. If a parameter is not specified, PAL will use its default value.

Name Data Type Default Value Description Dependency
TOPK Integer 10 Specifies the number (k) of top rules.  
MAX_ANTECENDENT Integer 4 Specifies the maximum length of antecedent rules.  
MEASURE_TYPE Integer 0 Specifies the measure that will be used to define the priority of the rules.
  • 0: Leverage
  • 1: Lift
 
IS_USE_EPSILON Integer 0 Controls whether to use epsilon to punish the length of rules:
  • 0: Does not use epsilon
  • 1: Uses epsilon
 
THREAD_NUMBER Integer 1 Specifies the number of threads.  
MIN_SUPPORT Double 0.0 Specifies the minimum support.  
MIN_CONFIDENCE Double 0.0 Specifies the minimum confidence.  
MIN_COVERAGE Double The value of MIN_SUPPORT Specifies the minimum coverage.

Default: T

 
MIN_MEASURE Double 0.0 Specifies the minimum measure value for leverage or lift, dependent on the MEASURE_TYPE setting.  
EPSILON Double 0.0 Epsilon value. Only valid when IS_USE_EPSILON is 1.

Output Tables

Table Column Column Data Type Description
Rules 1st column Integer ID
2nd column Double Support
3rd column Double Confidence
4th column Double Lift
5th column Double Leverage
6th column Double Measure value
Antecedent 1st column Integer ID
2nd column Varchar or nvarchar Antecedent items
Consequent 1st column Integer ID
2nd column Varchar or nvarchar Consequent items

Example

Assume that:
  • DM_PAL is a schema belonging to USER1; and

  • USER1 has been assigned the AFLPM_CREATOR_ERASER_EXECUTE role; and

  • USER1 has been assigned the AFL__SYS_AFL_AFLPAL_EXECUTE or AFL__SYS_AFL_AFLPAL_EXECUTE_WITH_GRANT_OPTION role.

SET SCHEMA DM_PAL;

DROP TYPE PAL_KORD_DATA_T;
CREATE TYPE PAL_KORD_DATA_T AS TABLE(
"CUSTOMER" INTEGER,
"ITEM" VARCHAR(20)
);

DROP TYPE PAL_KORD_RULES_T;
CREATE TYPE PAL_KORD_RULES_T AS TABLE(
"ID" INTEGER,
"SUPPORT" DOUBLE, 
"CONFIDENCE" DOUBLE,
"LIFT" DOUBLE,
"LEVERAGE" DOUBLE,
"MEASURE" DOUBLE
);

DROP TYPE PAL_KORD_ANTE_ITEMS_T;
CREATE TYPE PAL_KORD_ANTE_ITEMS_T AS TABLE(
"ID" INTEGER,
"ANTECEDENT" VARCHAR(20)
);
DROP TYPE PAL_KORD_CONS_ITEMS_T;
CREATE TYPE PAL_KORD_CONS_ITEMS_T AS TABLE(
	"ID" INTEGER,
	"CONSEQUENT" VARCHAR(20)
);

DROP TYPE PAL_CONTROL_T;	
CREATE TYPE PAL_CONTROL_T AS TABLE(
	"NAME" VARCHAR(100),
	"INTARGS" INTEGER, 
	"DOUBLEARGS" DOUBLE,
	"STRINGARGS" VARCHAR (100)
);

DROP TABLE PAL_KORD_PDATA_TBL;
CREATE COLUMN TABLE PAL_KORD_PDATA_TBL(
	"POSITION" INT,
	"SCHEMA_NAME" NVARCHAR(256),
	"TYPE_NAME" NVARCHAR(256),
	"PARAMETER_TYPE" VARCHAR(7)
);
INSERT INTO PAL_KORD_PDATA_TBL VALUES (1, 'DM_PAL', 'PAL_KORD_DATA_T', 'IN'); 
INSERT INTO PAL_KORD_PDATA_TBL VALUES (2, 'DM_PAL', 'PAL_CONTROL_T', 'IN'); 
INSERT INTO PAL_KORD_PDATA_TBL VALUES (3, 'DM_PAL', 'PAL_KORD_RULES_T', 'OUT');
INSERT INTO PAL_KORD_PDATA_TBL VALUES (4, 'DM_PAL', 'PAL_KORD_ANTE_ITEMS_T', 'OUT');
INSERT INTO PAL_KORD_PDATA_TBL VALUES (5, 'DM_PAL', 'PAL_KORD_CONS_ITEMS_T', 'OUT'); 	

CALL "SYS".AFLLANG_WRAPPER_PROCEDURE_DROP('DM_PAL','PAL_KORD_PROC');
call "SYS".AFLLANG_WRAPPER_PROCEDURE_CREATE('AFLPAL', 'KORD', 'DM_PAL', 'PAL_KORD_PROC', PAL_KORD_PDATA_TBL);

DROP TABLE PAL_KORD_DATA_TBL;
CREATE COLUMN TABLE PAL_KORD_DATA_TBL LIKE PAL_KORD_DATA_T; 
INSERT INTO PAL_KORD_DATA_TBL VALUES (2, 'item2');
INSERT INTO PAL_KORD_DATA_TBL VALUES (2, 'item3');
INSERT INTO PAL_KORD_DATA_TBL VALUES (3, 'item1');
INSERT INTO PAL_KORD_DATA_TBL VALUES (3, 'item2');
INSERT INTO PAL_KORD_DATA_TBL VALUES (3, 'item4');
INSERT INTO PAL_KORD_DATA_TBL VALUES (4, 'item1');
INSERT INTO PAL_KORD_DATA_TBL VALUES (4, 'item3');
INSERT INTO PAL_KORD_DATA_TBL VALUES (5, 'item2');
INSERT INTO PAL_KORD_DATA_TBL VALUES (5, 'item3');
INSERT INTO PAL_KORD_DATA_TBL VALUES (6, 'item1');
INSERT INTO PAL_KORD_DATA_TBL VALUES (6, 'item3');
INSERT INTO PAL_KORD_DATA_TBL VALUES (0, 'item1');
INSERT INTO PAL_KORD_DATA_TBL VALUES (0, 'item2');
INSERT INTO PAL_KORD_DATA_TBL VALUES (0, 'item5'); 
INSERT INTO PAL_KORD_DATA_TBL VALUES (1, 'item2');
INSERT INTO PAL_KORD_DATA_TBL VALUES (1, 'item4'); 
INSERT INTO PAL_KORD_DATA_TBL VALUES (7, 'item1');
INSERT INTO PAL_KORD_DATA_TBL VALUES (7, 'item2');
INSERT INTO PAL_KORD_DATA_TBL VALUES (7, 'item3');
INSERT INTO PAL_KORD_DATA_TBL VALUES (7, 'item5');
INSERT INTO PAL_KORD_DATA_TBL VALUES (8, 'item1');
INSERT INTO PAL_KORD_DATA_TBL VALUES (8, 'item2');
INSERT INTO PAL_KORD_DATA_TBL VALUES (8, 'item3');

DROP TABLE #PAL_CONTROL_TBL;
CREATE LOCAL TEMPORARY COLUMN TABLE #PAL_CONTROL_TBL(
	"NAME" VARCHAR(100),
	"INTARGS" INTEGER, 
	"DOUBLEARGS" DOUBLE,
	"STRINGARGS" VARCHAR (100)
);
INSERT INTO #PAL_CONTROL_TBL VALUES ('THREAD_NUMBER', 2, null, null);
INSERT INTO #PAL_CONTROL_TBL VALUES ('TOPK', 5, null, null);
INSERT INTO #PAL_CONTROL_TBL VALUES ('MEASURE_TYPE', 1, null, null);
INSERT INTO #PAL_CONTROL_TBL VALUES ('MIN_SUPPORT', null, 0.1, null);
INSERT INTO #PAL_CONTROL_TBL VALUES ('MIN_CONFIDENCE', null, 0.2, null);
INSERT INTO #PAL_CONTROL_TBL VALUES ('IS_USE_EPSILON', 0, null, null);
INSERT INTO #PAL_CONTROL_TBL VALUES ('EPSILON', null, 0.1, null);


DROP TABLE PAL_KORD_RULES_TBL;
CREATE COLUMN TABLE PAL_KORD_RULES_TBL LIKE PAL_KORD_RULES_T;

DROP TABLE PAL_KORD_ANTE_ITEMS_TBL;
CREATE COLUMN TABLE PAL_KORD_ANTE_ITEMS_TBL LIKE PAL_KORD_ANTE_ITEMS_T;

DROP TABLE PAL_KORD_CONS_ITEMS_TBL;
CREATE COLUMN TABLE PAL_KORD_CONS_ITEMS_TBL LIKE PAL_KORD_CONS_ITEMS_T;

	
CALL "DM_PAL".PAL_KORD_PROC(PAL_KORD_DATA_TBL, #PAL_CONTROL_TBL, PAL_KORD_RULES_TBL, PAL_KORD_ANTE_ITEMS_TBL, PAL_KORD_CONS_ITEMS_TBL) with OVERVIEW;

SELECT * FROM PAL_KORD_RULES_TBL ORDER BY "MEASURE" DESC;
SELECT * FROM PAL_KORD_ANTE_ITEMS_TBL;
SELECT * FROM PAL_KORD_CONS_ITEMS_TBL;

Expected Result

PAL_KORD_RULES_TBL

PAL_KORD_ANTE_ITEMS_TBL

PAL_KORD_CONS_ITEMS_TBL