Model Statistics Component

Use the Model Statistics component to generate performance statistics to solve two-class problems for all scenarios (HANA and non-HANA). Visualize and share results in a range of charts. Use the component with the Model Compare component to compare two or more models and discover the best one for a predictive problem.

Calculate Performance Statistics

Model Statistics is a component that calculates performance statistics on datasets that are generated by algorithms. It can do so for two algorithm types, classification and regression. In addition, you can configure the component to generate performance statistics for Train, Validate and Test datasets and selected KPIs.

Two-Class Problems

The component works only with two-class problems. A two-class problem is a business problem with a binary outcome, which means that it classifies the elements of a given dataset into two groups by a classification rule.

One example is in churn modeling for a business with a subscription service. In such a case, the two-class problem is to identify subscribers who will stay with the service, and those who will leave.

Another example is fraud detection at a financial institution, where the two-class problem is to identify which transactions are fraudulent, and which are not.

How To Ensure a Strong Predictive Quality (KI)

You must ensure that the predictive quality (Ki) of the model is strong. For example, if the Ki is zero, it means that the model is not trained well and inspires no confidence, since it is equivalent essentially to a random model.

The Ki is directly linked to the amount of information available to predict the target. Therefore, you can improve the KI by increasing the number of useful variables in the model in the following ways:

  • Use all variables available.
  • Use your domain knowledge to find other sources of information.
  • Build variables from the existing ones with data manipulations.
  • Use combination of variables by increasing the polynomial degree.
Charts in Model Statistics

You can generate and share charts for classification and regression algorithms in the Model Statistics component. The charts visualize the performance of Classification and Regression algorithms.

Classification charts:
  • Gain(Profit): Visualizes the gain or profit that is realized by the model based on a percentage of the target population selection. On the chart the y-axis shows Gain/profit and x-axis shows the Percentage.
  • Lift: Visualizes the amount of lift the trained model gives as compared to a random model. It allows examination of the difference between a perfect model, a random model and the model created. On the chart the y-axis shows Lift profit and x-axis displays the Percentage.
  • Standardized (KS): Visualizes the distance between the distribution functions of the two classes in binary classification (for example, Class 1 and Class 0). The score that generates the greatest separability between the functions is considered the threshold value for accepting or rejecting the target. The measure of seperability defines how well the model is able to distinguish between the records of two classes. If there are minor deviations in the input data, the model should still be able to identify these patterns and diiferentiate between the two. In this way, seperability is a metric of how good the model is; the greater the seperability, the greater the model. Note that the predictive model producing the greatest amount of separability between the two distributions is considered the superior model.
  • Receiving Operating Characteristic (ROC): Visualizes the ROC curve, which is generated by plotting the true positive rate (or sensitivity) at various threshold settings against the false positive rate (or the fall-out; calculated as 1 - specificity). The ROC curve is used to derive the metric, Area Under the Curve (AUC). On the chart, the y-axis shows Sensitivity, and X-axis displays Specificity.
Regression chart:
  • Model Accuracy: Visualizes how many records were correctly predicted in comparison to the actual target values.
Interaction with the Model Compare Component

You can use the Model Statistics component with the Model Compare component to learn the best algorithm for your predictive problem. First the Model Statistics component calculates the performance statistics for either classification or regression algorithm types. After which, the Model Compare component compares the calculated performance statistics to pick the best algorithm of those run at execution.

Note that when you change configurations in the Model Statistics component, it affects the Model Compare component.

In rendering the charts when interacting with Model Compare, the Model Statistics component overlays the partitions atop each and displays different results per partition. The Model Compare component does the same because both components use the same data. Therefore, you should ensure that you configure the KPIs for both exactly the same.

Interaction with the Partition Component

When the Partition component is included before the Model Statistics component in an analysis chain, you receive the option to use three different partitions: Train, Test and Validate. If the Partition component is not included, the Model Statistics component displays a set of statistics and charts for the Train partition only.