Fast Dynamic Time Warping — hanaml.FastDTW • hana.ml.r

hanaml.FastDTW is a R wrapper for SAP HANA PAL FAST DTW.

hanaml.FastDTW(
  data,
  radius,
  distance.level = NULL,
  minkowski.power = NULL,
  save.alignment = NULL,
  thread.ratio = NULL
)

Arguments

data

DataFrame
DataFrame containting the time serie data structured as follows:

ID of time series : INTEGER or CHARACTER
Timestamps of time series : INTEGER or CHARACTER
cloumns for time series data : INTEGER or DOUBLE

radius

integer
Parameter used for fast DTW algorithm. It is for balancing DTW accuracy and runtime. The bigger, the more accuracy but slower. Must be positive. Defaults to 3.0.

distance.level

{"manhattan", "euclidean", "minkowski", "chebyshev", "cosine"}, optional
Specifies the method used to compute distance between two points.

"manhattan" manhattan norm (l1 norm)
"euclidean" euclidean norm (l2 norm)
"minkowski" minkowski norm (p-norm)
"chebyshev" chebyshev norm (maximum norm)
"cosine" Cosine Similarity

Defaults to "euclidean".

minkowski.power

double, optional
Only valid when distance.level is "minkowski".
Specifies the power value of minkowski p-norm.
Defaults to 3.0.

save.alignment

logical, optional
Specifies whether or not to output alignment information.
Defaults to FALSE.

thread.ratio

double, optional
Controls the proportion of available threads that can be used by this function.
The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates all available threads. Values between 0 and 1 will use up to that percentage of available threads.
Values outside the range from 0 to 1 are ignored, and the actual number of threads used is then be heuristically determined.
Defaults to -1.

Value

Returns a list of DataFrames.

DataFrame Result for fast dtw, structured as follows:
- LEFT_<ID column name of input table> : ID of one time series
- RIGHT_<ID column name of input table> : ID of another time series
- DISTANCE: DTW distance of the two time series
DataFrame Alignment(optimal match) between input time-series, structured as:
- LEFT_<ID column name of input table> : ID of one time series
- RIGHT_<ID column name of input table> : ID of another time series
- LEFT_INDEX : Corresponding to index of timestamps of time series with ID of 1st column
- RIGHT_INDEX : Corresponding to index of timestamps of time series with ID of 2nd column
DataFrame Statistics for time series, structured as follows:
- STAT_NAME : Statistics name
- STAT_VALUE : Statistics value

Details

Dynamic Time Warping is a method for measuring similarity between two time series, which may vary in their speed. It can be used for pattern matching and anomaly detection.
Fast DTW is a twisted version of DTW to accelerate the computation when the size of the time series is huge. It recursively reduces the size of the time series and calculate the DTW path on the reduced version, then refine the DTW path on the original ones. It may lose some accuracy of actual DTW distance in exchange for the acceleration of computing.

Examples

Input DataFrame:


> data$Collect()
   ID TIMESTAMP ATTR1 ATTR2
1   1         1     1   5.2
2   1         2     2   5.1
3   1         3     3   2.0
4   1         4     4   0.3
5   1         5     5   1.2
6   2         1     7   2.0
7   2         2     6   1.4
8   2         3     1   0.9
9   2         4     3   1.2
10  2         5     2  10.2
11  2         6     5   2.3
12  2         7     4   4.5

Call the function:


> result <- hanaml.FastDTW(data=data,radius = 5, thread.ratio = 1,
                           distance.level = "euclidean", save.alignment = TRUE)

Results:


> result[[1]]$Collect()
  LEFT_ID RIGHT_ID DISTANCE
1       1        2  29.2764

> result[[2]]$Collect()
  LEFT_ID RIGHT_ID LEFT_INDEX RIGHT_INDEX
1       1        2          0           0
2       1        2          1           1
3       1        2          2           2
4       1        2          2           3
5       1        2          2           4
6       1        2          3           5
7       1        2          4           6