Anomaly Detection with Principal Component Analysis (PCA)

The principal component analysis (PCA) can be used to detect anomalies in mulitvariate sensor data.

What Does the Algorithm Do?

The algorithm basically transforms data readings from an existing coordinate system into a new coordinate system. This concept is depicted in the following sequence of figures:

The closer data readings are to the center of the new coordinate system, the closer these readings are to an optimum value.

Model Configuration

To configure a model for anomaly detection with PCA, use the REST APIs or configuration UIs for the machine learning engine.For more information, see the chapters Managing Machine Learning Engine Using Configuration UIs and Managing Machine Learning Engine Using REST APIs in the guide Configuring SAP Predictive Maintenance and Service, on-premise edition 1.0.

Data Preparation for Model Training and Scoring

Do not use all observations for model training. Rather, use observations from known assets with normal behavior for the model training instead of random observations.

PCA works with sensor data with the same timestamp. If sensor data that you want to include in the analysis does not have the same timestamp, this data needs to be aggregated before PCA can be started. If you want to analyze sums or average values of data, or maximum and minimum values, you also need to aggregate data before PCA is started.

Model Training

Model training for PCA means calculating the eigenvectors and eigenvalues of the covariance matrix of the training data.

Depending on the individual use case, PCA can be applied to any kind of reading, be it one-second-interval readings of sensors or aggregated sensor readings.

To train a model for anomaly detection with PCA, use the REST APIs or configuration UIs for the machine learning engine. For more information, see the chapters Managing Machine Learning Engine Using Configuration UIs and Managing Machine Learning Engine Using REST APIs in the guide Configuing SAP Predictive Maintenance and Service, on-premise edition 1.0.

Model Scoring

Anomaly Score

The anomaly score is calculated using the Mahalanobis distance between a sensor reading and the mean of all readings, which is the center of the transformed coordinate system.

Smoothing

In some cases, anomaly scores can be high but only for a few seconds. This phenomenon is not usually critical, and can represent normal behavior (if a machine is started, for example). Smoothing algorithms added to the anomaly score prevent anomaly alerts caused by sporadic anomalies. The smoothing of algorithms is done using the running median.

To score a model for anomaly detection with PCA, use the REST APIs or configuration UIs for the machine learning engine. For more information, see the chapters Managing Machine Learning Engine Using Configuration UIs and Managing Machine Learning Engine Using REST APIs in the guide Configuring SAP Predictive Maintenance and Service, on-premise edition 1.0.