Failure Prediction Using Tree Ensemble Classifier (TEC)

Based on records of sensor data, a tree ensemble model can learn to predict future system failures from past failures.

What Does the Algorithm Do?

The algorithm trains a boosted decision tree model, which is a series of decision trees, that encodes characteristics of data records with regards to failure. Based on the values of features of a given data record, the model is trained in such a way that each tree can decide which set of record groups the given record belongs to. An appropriate weight is then assigned to each record, indicating evidence for or against the record belonging to a failing system. The model aggregates the evidence weights of all trees and outputs a probability of failure. Thus, the model reflects the certainty that the given data record is an indication of a failing system.

The following figure illustrates the tree ensemble model created by the algorithm:

Model Configuration

To configure a model for the tree ensemble classifier, use the REST APIs or configuration UIs for data science services. For more information, see the chapters Managing Machine Learning Engine Using Configuration UIs and Managing Machine Learning Engine Using REST APIs in the guide Configuring SAP Predictive Maintenance and Service, on-premise edition 1.0.

Data Preparation for Model Training and Scoring

This algorithm is a supervised learning method. This means that it requires training data records featuring a column that indicates for each record whether a record belongs to a regular or a failing system.

The algorithm is not designed to handle time series data directly. If the algorithm is to be used to analyze time-dependent data, the data needs to be preprocessed in SAP HANA views. Preprocessing is required to extract the appropriate features from the data.

Model Training

Model training for TEC means using the provided historical training data to learn the following:
  • A series of decision trees, including decision thresholds
  • Evidence weights
  • An appropriate mapping of weights to probabilities of failure that represents the training data well
Together, these make up the model as referred to in the context of this algorithm.

The aim is to find a model that well represents the data set used for training.

To train a model for the tree ensemble classifier, use the REST APIs or configuration UIs for data science services. For more information, see the chapters Managing Machine Learning Engine Using Configuration UIs and Managing Machine Learning Engine Using REST APIs in the guide Configuring SAP Predictive Maintenance and Service, on-premise edition 1.0.

Model Scoring

To score a record, the TEC model determines for each decision tree the group that the record belongs to based on the feature values of this record. Next, the assigned evidence weights of each tree are aggregated for this record. Based on the resulting final weight, the algorithm determines whether or not the record indicates a system failure alongside the certainty of this prediction. Both pieces of information are written to the SAP HANA database as a health score for further use in the SAP Predictive Maintenance and Service, on-premise edition.

To score a model for the tree ensemble classifier, use the REST APIs or configuration UIs for data science services. For more information, see the chapters Managing Machine Learning Engine Using Configuration UIs and Managing Machine Learning Engine Using REST APIs in the guide Configuring SAP Predictive Maintenance and Service, on-premise edition 1.0.