fast_dtw

hana_ml.algorithms.pal.tsa.fast_dtw.fast_dtw(data, radius, thread_ratio=None, distance_method=None, minkowski_power=None, save_alignment=None)

DTW is an abbreviation for Dynamic Time Warping. It is a method for calculating distance or similarity between two time series. fast DTW is a twisted version of DTW to accelerate the computation when size of time series is huge. It recursively reduces the size of time series and calculate the DTW path on the reduced version, then refine the DTW path on the original ones. It may loss some accuracy of actual DTW distance in exchange of acceleration of computing.

Parameters
dataDataFrame

Input data, expected to be structured as follows:

  • ID for multiple time series

  • Timestamps

  • Attributes of time series

radiusint

Parameter used for fast DTW algorithm. It is for balancing DTW accuracy and runtime. The bigger, the more accuracy but slower. Must be positive.

thread_ratiofloat, optional

Controls the proportion of available threads to use. The ratio of available threads.

  • 0: single thread.

  • 0~1: percentage.

  • Others: heuristically determined.

Defaults to -1.

distance_method{'manhattan', 'euclidean', 'minkowski', 'chebyshev', 'cosine'}, optional

Specifies the method to compute the distance between two points.

  • 'manhattan': Manhattan distance

  • 'euclidean': Euclidean distance

  • 'minkowski': Minkowski distance

  • 'chebyshev': Chebyshev distance

  • 'cosine': Cosine distance

Defaults to 'euclidean'.

minkowski_powerdouble, optional

Specifies the power of the Minkowski distance method.

Only valid when distance_method is 'minkowski'.

Defaults to 3.

save_alignmentbool, optional

Specifies if output alignment information. If True, output the table.

Defaults to False.

Returns
DataFrame
Result for fast dtw, structured as follows:
  • LEFT_<ID column name of input table>: ID of one time series.

  • RIGHT_<ID column name of input table>: ID of the other time series.

  • DISTANCE: DTW distance of two time series.

Alignment table, structured as follows:
  • LEFT_<ID column name of input table>: ID of one time series.

  • RIGHT_<ID column name of input table>: ID of the other time series.

  • LEFT_INDEX: Corresponding to index of timestamps of time series with ID of 1st column.

  • RIGHT_INDEX : Corresponding to index of timestamps of time series with ID of 2nd column.

Statistics for time series, structured as follows:
  • STAT_NAME: Statistics name.

  • STAT_VALUE: Statistics value.

Examples

>>> result, align, stats = fast_dtw(data, 5)