abc_analysis

hana_ml.algorithms.pal.abc_analysis.abc_analysis(data, key=None, percent_A=None, percent_B=None, percent_C=None, revenue=None, thread_ratio=None)

ABC analysis is used to classify objects (such as customers, employees, or products) based on a particular measure (such as revenue or profit). ABC analysis suggests that inventories of an organization are not of equal value, thus can be grouped into three categories (A, B, and C) by their estimated importance. 'A' items are very important for an organization. 'B' items are of medium importance, that is, less important than 'A' items and more important than 'C' items. 'C' items are of the least importance. An example of ABC classification is as follows:

'A' items - 20% of the items (customers) accounts for 70% of the revenue.
'B' items - 30% of the items (customers) accounts for 20% of the revenue.
'C' items - 50% of the items (customers) accounts for 10% of the revenue.

Parameters:

dataDataFrame

The input data.

keystr, optional

Name of the ID column.

Defaults to the index column of data (i.e. data.index) if it is set.

revenuestr, optional

Name of column for revenue (or profits).

If not given, the input DataFrame must only have two columns.

Defaults to the first non-key column.

percent_Afloat

The proportion allocated to A class.

percent_Bfloat

The proportion allocated to B class.

percent_Cfloat

The proportion allocated to C class.

thread_ratiofloat, optional

Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.

Defaults to 0.

Returns:

DataFrame: The result after partitioning the data into three categories.

Examples

Input DataFrame:

>>> df.collect()
      ITEM    VALUE
0    item1     15.4
1    item2    200.4
...
8    item9    96.15
9   item10      9.4

Perform abc_analysis():

>>> res = abc_analysis(data=df, key='ITEM', thread_ratio=0.3,
                       percent_A=0.7, percent_B=0.2, percent_C=0.1)
>>> res.collect()
   ABC_CLASS         ITEM
0          A        item3
1          A        item2
...
8          C        item8
9          C       item10