Generic Dynamic Time Warping — hanaml.DTW • hana.ml.r

hanaml.DTW is a R wrapper for SAP HANA PAL DTW.

hanaml.DTW(
  query.data,
  ref.data,
  radius = NULL,
  distance.level = NULL,
  minkowski.power = NULL,
  alignment.method = NULL,
  step.pattern = NULL,
  save.alignment = NULL,
  thread.ratio = NULL
)

Arguments

query.data

DataFrame
DataFrame containting the time-series data for query, expected to be structured as follows:

1st column: ID of query series, type INTEGER, VARCHAR or NVARCHAR
2nd column: Order of time series, type INTEGER, VARCHAR or NVARCHAR
Other columns: Series data, type INTEGER, DOUBLE or DECIMAL(p,s)

ref.data

DataFrame

1st column: ID of query series, type INTEGER, VARCHAR or NVARCHAR
2nd column: Order of time series, type INTEGER, VARCHAR or NVARCHAR
Other columns: Series data, type INTEGER, DOUBLE or DECIMAL(p,s)

ref.data must have the same number of columns as query.data.

radius

integer, optional
A constraint to restrict match curve in an area near diagonal.
-1 means no such constraint, otherwise the number must be nonnegative.
By setting this constraint, users may get suboptimal result in exchange for runtime reduction.
Inappropriate setting of this value may lead to no result at all(e.g. set to 0 for two time-series of different sizes).
Defaults to -1.

distance.level

c("manhattan", "euclidean", "minkowski", "chebyshev", "cosine"), optional
Specifies the method used to compute distance between two points.

"manhattan" Manhattan distance(l1 norm)
"euclidean" Euclidean distance(l2 norm)
"minkowski" Minkowski distance(p-norm)
"chebyshev" Chebyshev distance(maximum norm)
"cosine" Cosine distance

Defaults to "euclidean".

minkowski.power

double, optional
Only valid when distance.level is "minkowski".
Specifies the power value of Minkowski p-norm.
Defaults to 3.0.

alignment.method

character, optional
Specifies the alignment method for begin/end points of time-series.
Valid optional include:

"closed": both begin and end points must be aligned.
"open_end": only begin point needs to be aligned.
"open_begin": only end point needs to be aligned.
"open": neither begin or end point needs to be aligned.

Defaults to "closed".

step.pattern

integer or list Specifies the step pattern for DTW calculation.
Integers refer to pre-defined steps patterns, ranging from 1 to 5.
Lists are for custom defined step patterns, where each element is a step.
For example, the predefined step pattern 1 can be written in custom defined step pattern as follows:
list(c(1,0,1), c(1,1,1), c(0,1,1)),
while predefined step pattern 5 can be written as:
list(c(1,1,1,1,0,1), c(1,1,1), c(1,1,0.5,0,1,0.5)).
Note: Each step could be a simple step, or a intricate one composed of several simple steps executed consecutively. Each simple step is represented by 3 numbers(i.e. a traid), with the first two numbers representing the movement along the query and reference index respectively, and the 3rd number representing the weight of this simple step.
Defaults to 3.

save.alignment

logical, optional
Specifies whether or not to output alignment information.
If set to FALSE, the alignment table will be empty.
Defaults to FALSE.

thread.ratio

double, optional
Controls the proportion of available threads that can be used by this function.
The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates all available threads. Values between 0 and 1 will use up to that percentage of available threads.
Values outside the range from 0 to 1 are ignored, and the actual number of threads used is then be heuristically determined.
Defaults to -1.

Value

Returns a list of DataFrames.

DataFrame Result for DTW, structured as follows:
- QUERY_<ID column of query data> : ID of time-series for query.
- REF_<ID column of refernece data> : ID of time-series for reference.
- DISTANCE: DTW distance of the two time-series
- WEIGHT: Total weight of match
- AVG_DISTANCE: Normalized distance of two-series.
DataFrame Alignment(optimal match) between input time-series, structured as:
- QUERY_<ID column of query data> : ID of time-series for query
- REF_<ID column of reference data> : ID of time-series for reference.
- QUERY_INDEX : Corresponding to index(timestamp) of query data.
- REF_INDEX : Corresponding to index(timestamp) of reference data.
DataFrame Statistics for time series, structured as follows:
- STAT_NAME : Statistics name
- STAT_VALUE : Statistics value

Details

Dynamic Time Warping is a method for measuring similarity between two time series, which may vary in their speed, it makes one series match the other one as much as possible by stretching or compressing one or both series. It can be used for pattern matching and anomaly detection.

Examples

Input DataFrame:


> query.data$Collect()
   ID TIMESTAMP ATTR1 ATTR2
1   1         1     1   5.2
2   1         2     2   5.1
3   1         3     3   2.0
4   1         4     4   0.3
5   1         5     5   1.2
6   1         6     6   7.7
7   1         7     7   0.0
8   1         8     8   1.1
9   1         9     9   3.2
10  1        10    10   2.3
11  2         1     7   2.0
12  2         2     6   1.4
13  2         3     1   0.9
14  2         4     3   1.2
15  2         5     2  10.2
16  2         6     5   2.3
17  2         7     4   4.5
18  2         8     3   4.6
19  2         9     3   3.5

> ref.data$Collect()
   ID TIMESTAMP ATTR1 ATTR2
1   3         1    10   1.0
2   3         2     5   2.0
3   3         3     2   3.0
4   3         4     8   1.4
5   3         5     1  10.8
6   3         6     5   7.7
7   3         7     5   6.3
8   3         8    12   2.4
9   3         9    20   9.4
10  3        10     4   0.5
11  3        11     6   2.2

Call the function:


> output <- hanaml.DTW(query.data,
                       ref.data,
                       radius = -1,
                       thread.ratio = 1,
                       distance.level = "euclidean",
                       step.pattern = list(c(1,1,1,1,0,1),
                                           c(1,1,1),
                                           c(1,1,0.5,0,1,0.5)),
                       alignment.method = "closed",
                       save.alignment = TRUE)

Results:


> output[["alignment"]]$Collect()
   QUERY_ID REF_ID QUERY_INDEX REF_INDEX
1         1      3           0         0
2         1      3           1         1
3         1      3           2         2
4         1      3           3         2
5         1      3           4         3
6         1      3           5         4
7         1      3           5         5
8         1      3           6         6
9         1      3           6         7
10        1      3           7         8
11        1      3           7         9
12        1      3           8        10
13        1      3           9        10
14        2      3           0         0
15        2      3           1         1
16        2      3           2         2
17        2      3           3         3
18        2      3           4         4
19        2      3           4         5
20        2      3           5         6
21        2      3           6         6
22        2      3           7         7
23        2      3           7         8
24        2      3           8         9
25        2      3           8        10