Models with a Continuous Target

The default graphic displays the actual target values as a function of predicted target values. Two curves are displayed: one for the Validation sub-set (blue line) and another for the hypothetical perfect model (Wizard; green line). The Validation curve gives Actual Target value as a function of Predicted Target value. For example, when the model predicts 35, the average actual value is 37. The Wizard curve is just X=Y, meaning that all the predicted values are equal to the actual values. The graph is an easy way to quickly see model error. When the curve is going far from Wizard, it means that the predicted value is suspicious.

The graph is computed as follow:

  • about 20 segments or bins of predicted values are built. Each of these segments represents roughly 5% of the population.
  • for each of these segments, some basic statistics are computed on actual value, such as the mean of the segment (SegmentMean), the mean of the associated target (TargetMean) and the variance of this target within that segment (TargetVariance). For example for predicted value in [17; 19], the mean would be 18.5, the actual target mean would be 20.5 and the actual target variance would be 9. In this case we could say that, if the predicted value is between 17 and 19, the model is underestimating a bit the actual value.

For each curve, a dot on the graph corresponds to the segment mean on the X-axis, and the target mean on the Y-axis.

The blue area represents the expected deviation of the current model. The blue area shows where about 70% of the actual values are expected to be.

Note that this prediction range (half the "width" of the blue area) is equal to three times the standard deviation of the observed actual target for a given segment of predicted values. In other words, it means that, in case of a Gaussian distribution, about 70% of the actual points should be in the blue area (keep in mind that this is a theoretical percentage that may not be observed every time). The default setting for the type of curve parameter is Predicted vs Actual. The extreme values for prediction ranges are {TargetMean - (sqrt(TargetVariance)); TargetMean + (sqrt(TargetVariance))}

Note

sqrt(TargetVariance) is equal to the Standard Deviation and TargetMean +/- Standard Deviation is equal to the Confidence Interval.