Category Influence

Analyze the influence of different categories of an influencer on the target.

Category influence is an analysis of the influence of different categories of an influencer on the target, computed from basic information:

The category frequency: percentage of observations within this category.
The difference between the percentage of positive cases in this category and the percentage of positive cases in the whole population.

How to interpret the influence?

The higher the absolute value of the influence, the stronger the influence of the category is: categories with values equal to 0 or close to 0 are categories with no influence on the target.

The influence of a category can be positive or negative:

Categories with positive values are categories where observations are more likely to be in the positive category of the target: The percentage of positive targets within this category is above the percentage of positive target in the whole data source.
Categories with negative values are categories where the observations are more likely to be in the negative category of the target: the percentage of positive target within this category is below the percentage of negative target in the whole data source.

The influence is computed for each category and provided by the engine using this formula:

Influence(C) = NP(C) * Frequency(C) / NC

where:

NC = Normalization Constant = (target_key_frequency) * (1 – target_key_frequency).
NP(C) = Profit(most_frequent_target_category) * P(most_frequent_target_category|C) + Profit(least_frequent_target_category) * P(least_frequent_target_category|C).
Profit(most_frequent_target_category) * proba(most_frequent_target_category) + profit(least_frequent_target_category) * proba(least_frequent_target_category) = 0.