hanaml.DTW.Rd
hanaml.DTW is a R wrapper
for SAP HANA PAL DTW.
hanaml.DTW(
query.data,
ref.data,
radius = NULL,
distance.level = NULL,
minkowski.power = NULL,
alignment.method = NULL,
step.pattern = NULL,
save.alignment = NULL,
thread.ratio = NULL
)
DataFrame
DataFrame containting the time-series data for query, expected to be structured as follows:
1st column: ID of query series, type INTEGER, VARCHAR or NVARCHAR
2nd column: Order of time series, type INTEGER, VARCHAR or NVARCHAR
Other columns: Series data, type INTEGER, DOUBLE or DECIMAL(p,s)
DataFrame
1st column: ID of query series, type INTEGER, VARCHAR or NVARCHAR
2nd column: Order of time series, type INTEGER, VARCHAR or NVARCHAR
Other columns: Series data, type INTEGER, DOUBLE or DECIMAL(p,s)
ref.data
must have the same number of columns as query.data
.
integer, optional
A constraint to restrict match curve in an area near diagonal.
-1 means no such constraint, otherwise the number must be nonnegative.
By setting this constraint, users may get suboptimal result in exchange for runtime reduction.
Inappropriate setting of this value may lead to no result
at all(e.g. set to 0 for two time-series of different sizes).
Defaults to -1.
c("manhattan", "euclidean", "minkowski",
"chebyshev", "cosine"), optional
Specifies the method used to compute distance between two points.
"manhattan"
Manhattan distance(l1 norm)
"euclidean"
Euclidean distance(l2 norm)
"minkowski"
Minkowski distance(p-norm)
"chebyshev"
Chebyshev distance(maximum norm)
"cosine"
Cosine distance
Defaults to "euclidean".
double, optional
Only valid when distance.level
is "minkowski".
Specifies the power value of Minkowski p-norm.
Defaults to 3.0.
character, optional
Specifies the alignment method for begin/end points of time-series.
Valid optional include:
"closed"
: both begin and end points must be aligned.
"open_end"
: only begin point needs to be aligned.
"open_begin"
: only end point needs to be aligned.
"open"
: neither begin or end point needs to be aligned.
Defaults to "closed".
integer or list
Specifies the step pattern for DTW calculation.
Integers refer to pre-defined steps patterns, ranging from 1 to 5.
Lists are for custom defined step patterns, where each element is a step.
For example, the predefined step pattern 1 can be written in custom defined step
pattern as follows:
list(c(1,0,1), c(1,1,1), c(0,1,1)),
while predefined step pattern 5 can be written as:
list(c(1,1,1,1,0,1), c(1,1,1), c(1,1,0.5,0,1,0.5)).
Note: Each step could be a simple step, or a intricate one composed of several simple steps
executed consecutively. Each simple step is represented by 3 numbers(i.e. a traid),
with the first two numbers representing the movement along the query and reference index
respectively, and the 3rd number representing the weight of this simple step.
Defaults to 3.
logical, optional
Specifies whether or not to output
alignment information.
If set to FALSE, the alignment table will be empty.
Defaults to FALSE.
double, optional
Controls the proportion of available threads that can be used by this
function.
The value range is from 0 to 1, where 0 indicates a single thread,
and 1 indicates all available threads. Values between 0 and 1 will use up to
that percentage of available threads.
Values outside the range from 0 to 1 are ignored, and the actual number of threads
used is then be heuristically determined.
Defaults to -1.
Returns a list of DataFrames.
DataFrame
Result for DTW, structured as follows:
QUERY_<ID column of query data> : ID of time-series for query.
REF_<ID column of refernece data> : ID of time-series for reference.
DISTANCE: DTW distance of the two time-series
WEIGHT: Total weight of match
AVG_DISTANCE: Normalized distance of two-series.
DataFrame
Alignment(optimal match) between input time-series, structured as:
QUERY_<ID column of query data> : ID of time-series for query
REF_<ID column of reference data> : ID of time-series for reference.
QUERY_INDEX : Corresponding to index(timestamp) of query data.
REF_INDEX : Corresponding to index(timestamp) of reference data.
DataFrame
Statistics for time series, structured as follows:
STAT_NAME : Statistics name
STAT_VALUE : Statistics value
Dynamic Time Warping is a method for measuring similarity between two
time series, which may vary in their speed, it makes one series match the other one
as much as possible by stretching or compressing one or both series.
It can be used for pattern matching and anomaly detection.
Input DataFrame:
> query.data$Collect()
ID TIMESTAMP ATTR1 ATTR2
1 1 1 1 5.2
2 1 2 2 5.1
3 1 3 3 2.0
4 1 4 4 0.3
5 1 5 5 1.2
6 1 6 6 7.7
7 1 7 7 0.0
8 1 8 8 1.1
9 1 9 9 3.2
10 1 10 10 2.3
11 2 1 7 2.0
12 2 2 6 1.4
13 2 3 1 0.9
14 2 4 3 1.2
15 2 5 2 10.2
16 2 6 5 2.3
17 2 7 4 4.5
18 2 8 3 4.6
19 2 9 3 3.5
> ref.data$Collect()
ID TIMESTAMP ATTR1 ATTR2
1 3 1 10 1.0
2 3 2 5 2.0
3 3 3 2 3.0
4 3 4 8 1.4
5 3 5 1 10.8
6 3 6 5 7.7
7 3 7 5 6.3
8 3 8 12 2.4
9 3 9 20 9.4
10 3 10 4 0.5
11 3 11 6 2.2
Call the function:
> output <- hanaml.DTW(query.data,
ref.data,
radius = -1,
thread.ratio = 1,
distance.level = "euclidean",
step.pattern = list(c(1,1,1,1,0,1),
c(1,1,1),
c(1,1,0.5,0,1,0.5)),
alignment.method = "closed",
save.alignment = TRUE)
Results:
> output[["alignment"]]$Collect()
QUERY_ID REF_ID QUERY_INDEX REF_INDEX
1 1 3 0 0
2 1 3 1 1
3 1 3 2 2
4 1 3 3 2
5 1 3 4 3
6 1 3 5 4
7 1 3 5 5
8 1 3 6 6
9 1 3 6 7
10 1 3 7 8
11 1 3 7 9
12 1 3 8 10
13 1 3 9 10
14 2 3 0 0
15 2 3 1 1
16 2 3 2 2
17 2 3 3 3
18 2 3 4 4
19 2 3 4 5
20 2 3 5 6
21 2 3 6 6
22 2 3 7 7
23 2 3 7 8
24 2 3 8 9
25 2 3 8 10