class for Bayesian structure time-series(BSTS)

hanaml.BSTS(
  data = NULL,
  key = NULL,
  endog = NULL,
  exog = NULL,
  burn = NULL,
  niter = NULL,
  seasonal.period = NULL,
  expected.model.size = NULL,
  seed = NULL
)

Arguments

data

DataFrame
Input DataFrame containing the time-series data for BSTS training.

key

character
The ID column that representing the order of time-series values in data.

endog

character, optional
The column in data that holds the endogenous variable(i.e. time-series values) for BSTS modeling.
Defaults to the 1st non-key column in data.

exog

character or list of characters, optional
An optional array of exogenous variables.

burn

double, optional
Specifies the ratio of total MCMC draws that are neglected from the beginning. Ranging from 0 to 1. In other words, only the tail 1-burn portion of the total MCMC draw is kept(in the model) for prediction.
Defaults to 0.5.

niter

integer, optional
Specifies the total number of MCMC draws.
Defaults to 1000.

seasonal.period

integer, optional
Specifies the value of seasonal period.
Defaults to -1.

expected.model.size

integer, optional
Specifies the number of contemporaneous data that are expected to be included in the model.
Defaults to half of the number of contemporaneous data columns.

Value

A **hanaml.BSTS** object with the following attributes

  • stats: DataFrame
    Decomposed components of the target time-series, structured as follows:

    • 1st column : DATA_NAME, type NVARCHAR or NVARCHAR,

    • 2nd column : INCLUSION_PROB, type DOUBLE,

    • 3rd column : VG_COEFF, type DOUBLE

  • decompose: DataFrame
    For storing the variance information of the training data, structured as follows:

    • 1st column : TIME_STAMP, type INTEGER

    • 2nd column : TREND, type DOUBLE

    • 3rd column : SEASONAL, type DOUBLE,

    • 4th column : REGRESSION, type DOUBLE,

    • 5th column : RANDOM, type DOUBLE

  • model: DataFrame
    DataFrame containing the retained tail MCMC samples in a JSON string, structured as follows:

  • 1st column: ROW_INDEX, type INTEGER

  • 2nd column : MODEL_CONTENT, type NVARCHAR

Details

can be seen as a combination of three Bayesian methods altogether - Kalman filter, spike-and-slab regression and Bayesian model averaging. In particular, samples of model parameters are drawn from its posterior distributions using MCMC.

Examples

Input data:


> data
  TIME_STAMP  TARGET_SERIES  FEATURE_01  FEATURE_02  ...  FEATURE_07  FEATURE_08  FEATURE_09  FEATURE_10
0            0          2.536       1.488      -0.561  ...       0.300       1.750       0.498       0.073
1            1          0.882       1.100      -0.992  ...       0.180      -0.011       0.264       0.584
2            2         -0.077       1.155      -1.212  ...       0.119      -0.028       0.031       0.448
3            3          0.135       0.530      -1.034  ...       0.727      -0.230      -0.143      -0.269
4            4          0.373       0.698      -1.195  ...       0.598       0.625      -0.219      -1.006
5            5         -0.437       0.441      -1.386  ...      -0.199      -0.401      -0.526      -1.124
6            6         -0.556       0.405      -0.844  ...      -0.245      -0.976      -0.699      -0.504
7            7         -0.432      -0.016      -1.001  ...      -0.871      -1.236      -0.884      -1.254
8            8         -0.460       0.271      -1.234  ...      -0.359      -0.555      -0.778      -2.114
9            9         -0.698      -0.357      -1.269  ...      -1.116       0.156      -1.182      -2.958
10          10         -0.765      -0.006      -1.326  ...      -0.276       0.158      -0.917      -0.939
11          11         -0.833      -0.647      -2.124  ...      -0.978      -0.572      -1.158      -1.758
12          12         -0.767      -0.282      -1.615  ...      -0.444      -1.992      -0.898      -0.831
13          13         -0.356      -0.503      -1.035  ...      -0.397      -0.897      -0.844      -0.425
14          14         -0.496      -0.998      -1.356  ...      -0.669      -0.338      -1.145      -1.210
15          15         -0.684      -0.618      -1.060  ...      -0.805      -0.373      -1.040      -0.868
16          16         -0.953      -0.547      -1.437  ...      -0.504      -0.512      -0.898      -1.441
17          17         -0.869      -0.403      -1.360  ...      -0.636       0.065      -1.069      -0.929
18          18         -0.831      -0.691      -1.553  ...      -0.626      -0.489      -0.858      -1.033
...
47          47          0.730      -0.282      -1.019  ...      -0.511      -1.127      -0.792      -0.368
48          48         -0.181      -0.145      -0.585  ...      -0.939      -0.388      -1.062      -0.547
49          49         -0.144      -0.120      -0.496  ...      -0.856      -1.313      -1.161       0.150

> bs <- hanaml.BSTS(data = data,
                    key = "TIME_STAMP",
                    burn = 0.6, expected.model.size = 2, niter = 2000,
                    seasonal.period = 12, seed = 1)

> bs$stats
    DATA_NAME  INCLUSION_PROB  AVG_COEFF
0  FEATURE_08         0.48500   0.173861
1  FEATURE_01         0.40250   0.437837
2  FEATURE_07         0.24625   0.189362
3  FEATURE_09         0.23375   0.081339
4  FEATURE_02         0.19750   0.098693
5  FEATURE_04         0.14375   0.130138
6  FEATURE_06         0.14125   0.062544
7  FEATURE_10         0.10375   0.003327
8  FEATURE_03         0.08875   0.009415
9  FEATURE_05         0.08750   0.021849
  

See also