dtw

hana_ml.algorithms.pal.tsa.dtw.dtw(query_data, ref_data, radius=None, thread_ratio=None, distance_method=None, minkowski_power=None, alignment_method=None, step_pattern=None, save_alignment=None)

DTW is an abbreviation for Dynamic Time Warping. It is a method for calculating distance or similarity between two time series. It makes one series match the other one as much as possible by stretching or compressing one or both two.

Parameters:

query_dataDataFrame

Query data for DTW, expected to be structured as follows:

1st column : ID of query time-series, type INTEGER, VARCHAR or NVARCHAR.

2nd column : Order(timestamps) of query time-series, type INTEGER, VARCHAR or NVARCHAR.

Other columns : Series data, type INTEGER, DOUBLE or DECIMAL.

ref_dataDataFrame

Reference data for DTW, expected to be structured as follows:

1st column : ID of reference time-series, type INTEGER, VARCHAR or NVARCHAR

2nd column : Order(timestamps) of reference time-series, type INTEGER, VARCHAR or NVARCHAR

Other columns : Series data, type INTEGER, DOUBLE or DECIMAL, must have the same cardinality(i.e. number of columns) as that of data.

radiusint, optional

Specifies a constraint to restrict match curve in an area near diagonal.

To be specific, it makes sure that the absolute difference for each pair of subscripts in the match curve is no greater than radius.

-1 means no such constraint, otherwise radius must be nonnegative.

Defaults to -1.

thread_ratiofloat, optional

Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.

Defaults to -1.

distance_method{'manhattan', 'euclidean', 'minkowski', 'chebyshev', 'cosine'}, optional

Specifies the method to compute the distance between two points.

'manhattan' : Manhattan distance

'euclidean' : Euclidean distance

'minkowski' : Minkowski distance

'chebyshev' : Chebyshev distance

'cosine' : Cosine distance

Defaults to 'euclidean'.

minkowski_powerdouble, optional

Specifies the power of the Minkowski distance method.

Only valid when distance_method is 'minkowski'.

Defaults to 3.

alignment_method{'closed', 'open_begin', 'open_end', 'open'}

Specifies the alignment constraint w.r.t. beginning and end points in reference time-series.

'closed' : Both beginning and end points must be aligned.

'open_end' : Only beginning point needs to be aligned.

'open_begin': Only end point needs to be aligned.

'open': Neither beginning nor end point need to be aligned.

Defaults to 'closed'.

step_patternint or ListOfTuples

Specifies the type of step patterns for DTW algorithm.

There are five predefined types of step patterns, ranging from 1 to 5.

Users can also specify custom defined step patterns by providing a list tuples.

Defaults to 3.

Note

A custom defined step pattern is represented either by a single triad or a tuple of consecutive triads, where each triad is in the form of \((\Delta x, \Delta y, \omega)\) with \(\Delta x\) being the increment in query data index, \(\Delta y\) being the increment in reference data index, and \(\omega\) being the weight.

A custom defined step pattern type is simply a list of steps patterns.

For example, the predefined step patterns of type 5 can also be specified via custom defined step pattern type as follows:

[((1,1,1), (1,0,1)), (1,1,1), ((1,1,0.5), (0,1,0.5))].

For more details on step patterns, one may go to PAL DTW for reference.

save_alignmentbool, optional

Specifies whether to output alignment information or not.

True : Output the alignment information.

False : Do not output the alignment information.

Defaults to False.

Returns:

DataFrames

DataFrame 1 : Result for DTW, structured as follows:

QUERY_<ID column name of query data table> : ID of the query time-series.

REF_<ID column name of reference data table> : ID of the reference time-series.

DISTANCE : DTW distance of the two series. NULL if there is no valid result.

WEIGHT : Total weight of match.

AVG_DISTANCE : Normalized distance of two time-series. NULL if WEIGHT is near 0.

DataFrame 2 : Alignment information table, structured as follows:

QUERY_<ID column name of query data table> : ID of query time-series.

REF_<ID column name of input table> : ID of reference time-series.

QUERY_INDEX : Corresponding to index of query time-series.

REF_INDEX : Corresponding to index of reference time-series.

DataFrame 3 : Statistics.

Examples

>>> res, align, stats = dtw(query_data=df_1,
                            ref_data=df_2,
...                         step_pattern=[((1,1,1),(1,0,1)), (1,1,1), ((1,1,0.5),(0,1,0.5))],
...                         save_alignment=True)
>>> res.collect()