dtw
- hana_ml.algorithms.pal.tsa.dtw.dtw(query_data, ref_data, radius=None, thread_ratio=None, distance_method=None, minkowski_power=None, alignment_method=None, step_pattern=None, save_alignment=None)
DTW is an abbreviation for Dynamic Time Warping. It is a method for calculating distance or similarity between two time series. It makes one series match the other one as much as possible by stretching or compressing one or both two.
- Parameters:
- query_dataDataFrame
Query data for DTW, expected to be structured as follows:
1st column : ID of query time-series, type INTEGER, VARCHAR or NVARCHAR.
2nd column : Order(timestamps) of query time-series, type INTEGER, VARCHAR or NVARCHAR.
Other columns : Series data, type INTEGER, DOUBLE or DECIMAL.
- ref_dataDataFrame
Reference data for DTW, expected to be structured as follows:
1st column : ID of reference time-series, type INTEGER, VARCHAR or NVARCHAR
2nd column : Order(timestamps) of reference time-series, type INTEGER, VARCHAR or NVARCHAR
Other columns : Series data, type INTEGER, DOUBLE or DECIMAL, must have the same cardinality(i.e. number of columns) as that of
data
.
- radiusint, optional
Specifies a constraint to restrict match curve in an area near diagonal.
To be specific, it makes sure that the absolute difference for each pair of subscripts in the match curve is no greater than
radius
.-1 means no such constraint, otherwise
radius
must be nonnegative.Defaults to -1.
- thread_ratiofloat, optional
Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.
Defaults to -1.
- distance_method{'manhattan', 'euclidean', 'minkowski', 'chebyshev', 'cosine'}, optional
Specifies the method to compute the distance between two points.
'manhattan' : Manhattan distance
'euclidean' : Euclidean distance
'minkowski' : Minkowski distance
'chebyshev' : Chebyshev distance
'cosine' : Cosine distance
Defaults to 'euclidean'.
- minkowski_powerdouble, optional
Specifies the power of the Minkowski distance method.
Only valid when
distance_method
is 'minkowski'.Defaults to 3.
- alignment_method{'closed', 'open_begin', 'open_end', 'open'}
Specifies the alignment constraint w.r.t. beginning and end points in reference time-series.
'closed' : Both beginning and end points must be aligned.
'open_end' : Only beginning point needs to be aligned.
'open_begin': Only end point needs to be aligned.
'open': Neither beginning nor end point need to be aligned.
Defaults to 'closed'.
- step_patternint or ListOfTuples
Specifies the type of step patterns for DTW algorithm.
There are five predefined types of step patterns, ranging from 1 to 5.
Users can also specify custom defined step patterns by providing a list tuples.
Defaults to 3.
Note
A custom defined step pattern is represented either by a single triad or a tuple of consecutive triads, where each triad is in the form of \((\Delta x, \Delta y, \omega)\) with \(\Delta x\) being the increment in query data index, \(\Delta y\) being the increment in reference data index, and \(\omega\) being the weight.
A custom defined step pattern type is simply a list of steps patterns.
For example, the predefined step patterns of type 5 can also be specified via custom defined step pattern type as follows:
[((1,1,1), (1,0,1)), (1,1,1), ((1,1,0.5), (0,1,0.5))].
For more details on step patterns, one may go to PAL DTW for reference.
- save_alignmentbool, optional
Specifies whether to output alignment information or not.
True : Output the alignment information.
False : Do not output the alignment information.
Defaults to False.
- Returns:
- DataFrames
DataFrame 1 : Result for DTW, structured as follows:
QUERY_<ID column name of query data table> : ID of the query time-series.
REF_<ID column name of reference data table> : ID of the reference time-series.
DISTANCE : DTW distance of the two series. NULL if there is no valid result.
WEIGHT : Total weight of match.
AVG_DISTANCE : Normalized distance of two time-series. NULL if WEIGHT is near 0.
DataFrame 2 : Alignment information table, structured as follows:
QUERY_<ID column name of query data table> : ID of query time-series.
REF_<ID column name of input table> : ID of reference time-series.
QUERY_INDEX : Corresponding to index of query time-series.
REF_INDEX : Corresponding to index of reference time-series.
DataFrame 3 : Statistics.
Examples
>>> res, align, stats = dtw(query_data=df_1, ref_data=df_2, ... step_pattern=[((1,1,1),(1,0,1)), (1,1,1), ((1,1,0.5),(0,1,0.5))], ... save_alignment=True) >>> res.collect()