hanaml.FastDTW.Rd
hanaml.FastDTW is a R wrapper
for SAP HANA PAL FAST DTW.
hanaml.FastDTW(
data,
radius,
distance.level = NULL,
minkowski.power = NULL,
save.alignment = NULL,
thread.ratio = NULL
)
DataFrame
DataFrame containting the time serie data structured as follows:
ID of time series : INTEGER or CHARACTER
Timestamps of time series : INTEGER or CHARACTER
cloumns for time series data : INTEGER or DOUBLE
integer
Parameter used for fast DTW algorithm. It is for balancing DTW
accuracy and runtime. The bigger, the more accuracy but slower.
Must be positive.
Defaults to 3.0.
{"manhattan", "euclidean", "minkowski",
"chebyshev", "cosine"}, optional
Specifies the method used to compute distance between two points.
"manhattan"
manhattan norm (l1 norm)
"euclidean"
euclidean norm (l2 norm)
"minkowski"
minkowski norm (p-norm)
"chebyshev"
chebyshev norm (maximum norm)
"cosine"
Cosine Similarity
Defaults to "euclidean".
double, optional
Only valid when distance.level is "minkowski".
Specifies the power value of minkowski p-norm.
Defaults to 3.0.
logical, optional
Specifies whether or not to output
alignment information.
Defaults to FALSE.
double, optional
Controls the proportion of available threads that can be used by this
function.
The value range is from 0 to 1, where 0 indicates a single thread,
and 1 indicates all available threads. Values between 0 and 1 will use up to
that percentage of available threads.
Values outside the range from 0 to 1 are ignored, and the actual number of threads
used is then be heuristically determined.
Defaults to -1.
Returns a list of DataFrames.
DataFrame
Result for fast dtw, structured as follows:
LEFT_<ID column name of input table> : ID of one time series
RIGHT_<ID column name of input table> : ID of another time series
DISTANCE: DTW distance of the two time series
DataFrame
Alignment(optimal match) between input time-series, structured as:
LEFT_<ID column name of input table> : ID of one time series
RIGHT_<ID column name of input table> : ID of another time series
LEFT_INDEX : Corresponding to index of timestamps of time series with ID of 1st column
RIGHT_INDEX : Corresponding to index of timestamps of time series with ID of 2nd column
DataFrame
Statistics for time series, structured as follows:
STAT_NAME : Statistics name
STAT_VALUE : Statistics value
Dynamic Time Warping is a method for measuring similarity between two
time series, which may vary in their speed. It can be used for pattern
matching and anomaly detection.
Fast DTW is a twisted version of DTW to accelerate the computation when
the size of the time series is huge. It recursively reduces the
size of the time series and calculate the DTW path on the reduced version,
then refine the DTW path on the original ones.
It may lose some accuracy of actual DTW distance in exchange for the
acceleration of
computing.
Input DataFrame:
> data$Collect()
ID TIMESTAMP ATTR1 ATTR2
1 1 1 1 5.2
2 1 2 2 5.1
3 1 3 3 2.0
4 1 4 4 0.3
5 1 5 5 1.2
6 2 1 7 2.0
7 2 2 6 1.4
8 2 3 1 0.9
9 2 4 3 1.2
10 2 5 2 10.2
11 2 6 5 2.3
12 2 7 4 4.5
Call the function:
> result <- hanaml.FastDTW(data=data,radius = 5, thread.ratio = 1,
distance.level = "euclidean", save.alignment = TRUE)
Results:
> result[[1]]$Collect()
LEFT_ID RIGHT_ID DISTANCE
1 1 2 29.2764
> result[[2]]$Collect()
LEFT_ID RIGHT_ID LEFT_INDEX RIGHT_INDEX
1 1 2 0 0
2 1 2 1 1
3 1 2 2 2
4 1 2 2 3
5 1 2 2 4
6 1 2 3 5
7 1 2 4 6