Distance-Based Failure Analysis Using Earth Mover’s Distance (EMD)

What Does the Algorithm Do?

To explain what the algorithm does, we can use a figurative example: The algorithm solves a linear optimization problem, in this case a transportation problem. Suppose you want to transform a sandcastle A into a sandcastle B. Both sandcastles consist of the same amount of sand. EMD measures how much sand you have to transport across which distance. The algorithm compares the locations of both sandcastles: Are they located close to each other, or does the sand have to be transported a long way from sandcastle A to sandcastle B? The algorithm also compares how the two sandcastles are shaped, for example. If they have a similar shape, no or little rebuilding work needs to be done. If their shapes differ significantly, considerable effort is required to rebuild sandcastle A so that it looks like sandcastle B. The conclusion is that the closer and more similar the sandcastles are, the lower the work effort and transport costs are. In the example mentioned above, the fingerprint of battery A is compared to the fingerprint of a well functioning battery B, for example. The lower the score calculated with EMD, the more similar the fingerprints are (battery A is working like battery B). The higher the score, the more different the fingerprints are.

Model Configuration

To configure a model for distance-based failure analysis using earth mover's distance, use the REST APIs or configuration UIs for the machine learning engine. For more information, see the chapters Managing Machine Learning Engine Using Configuration UIs and Managing Machine Learning Engine Using REST APIs in the guide Configuring the SAP Predictive Maintenance and Service, on-premise edition.

Data Preparation for Model Training and Scoring

Before training and scoring, data scientists need to configure a model. In the configuration, they need to specify the names of one or more columns that contain values by which the data need to be grouped. These grouping columns are used for scoring. You can define as many grouping columns as required for your business case. The grouping columns are included in scoring only. In training, the grouping columns are used to exclude these columns from training.

To train a model, data scientists also need to specify a number of bins, and the expected minimum and maximum values of data columns. The minimum and maximum values are used for binning.

Model Training

To train a model for distance-based failure analysis using earth mover's distance, use the REST APIs or configuration UIs for the machine learning engine. For more information, see the chapters Managing Machine Learning Engine Using Configuration UIs and Managing Machine Learning Engine Using REST APIs in the guide Configuring SAP Predictive Maintenance and Service, on-premise edition 1.0.

Model Scoring

To score a model for distance-based failure analysis using earth mover's distance, use the REST APIs or configuration UIs for the machine learning engine. For more information, see the chapters Managing Machine Learning Engine Using Configuration UIs and Managing Machine Learning Engine Using REST APIs in the guide Configuring Predictive Maintenance and Service, on-premise edition 1.0