BSTS

class for Bayesian structure time-series(BSTS)

hanaml.BSTS(
  data = NULL,
  key = NULL,
  endog = NULL,
  exog = NULL,
  burn = NULL,
  niter = NULL,
  seasonal.period = NULL,
  expected.model.size = NULL,
  seed = NULL
)

Arguments

data: DataFrame
Input DataFrame containing the time-series data for BSTS training.
key: character
The ID column that representing the order of time-series values in data.
endog: character, optional
The column in data that holds the endogenous variable(i.e. time-series values) for BSTS modeling.
Defaults to the 1st non-key column in data.
exog: character or list of characters, optional
An optional array of exogenous variables.
burn: double, optional
Specifies the ratio of total MCMC draws that are neglected from the beginning. Ranging from 0 to 1. In other words, only the tail 1-burn portion of the total MCMC draw is kept(in the model) for prediction.
Defaults to 0.5.
niter: integer, optional
Specifies the total number of MCMC draws.
Defaults to 1000.
seasonal.period: integer, optional
Specifies the value of seasonal period.
Defaults to -1.
expected.model.size: integer, optional
Specifies the number of contemporaneous data that are expected to be included in the model.
Defaults to half of the number of contemporaneous data columns.

Value

A **hanaml.BSTS** object with the following attributes

stats: DataFrame
Decomposed components of the target time-series, structured as follows:
- 1st column : DATA_NAME, type NVARCHAR or NVARCHAR,
- 2nd column : INCLUSION_PROB, type DOUBLE,
- 3rd column : VG_COEFF, type DOUBLE
decompose: DataFrame
For storing the variance information of the training data, structured as follows:
- 1st column : TIME_STAMP, type INTEGER
- 2nd column : TREND, type DOUBLE
- 3rd column : SEASONAL, type DOUBLE,
- 4th column : REGRESSION, type DOUBLE,
- 5th column : RANDOM, type DOUBLE
model: DataFrame
DataFrame containing the retained tail MCMC samples in a JSON string, structured as follows:
1st column: ROW_INDEX, type INTEGER
2nd column : MODEL_CONTENT, type NVARCHAR

Details

can be seen as a combination of three Bayesian methods altogether - Kalman filter, spike-and-slab regression and Bayesian model averaging. In particular, samples of model parameters are drawn from its posterior distributions using MCMC.

Examples

Input data:


> data
  TIME_STAMP  TARGET_SERIES  FEATURE_01  FEATURE_02  ...  FEATURE_07  FEATURE_08  FEATURE_09  FEATURE_10
0            0          2.536       1.488      -0.561  ...       0.300       1.750       0.498       0.073
1            1          0.882       1.100      -0.992  ...       0.180      -0.011       0.264       0.584
2            2         -0.077       1.155      -1.212  ...       0.119      -0.028       0.031       0.448
3            3          0.135       0.530      -1.034  ...       0.727      -0.230      -0.143      -0.269
4            4          0.373       0.698      -1.195  ...       0.598       0.625      -0.219      -1.006
5            5         -0.437       0.441      -1.386  ...      -0.199      -0.401      -0.526      -1.124
6            6         -0.556       0.405      -0.844  ...      -0.245      -0.976      -0.699      -0.504
7            7         -0.432      -0.016      -1.001  ...      -0.871      -1.236      -0.884      -1.254
8            8         -0.460       0.271      -1.234  ...      -0.359      -0.555      -0.778      -2.114
9            9         -0.698      -0.357      -1.269  ...      -1.116       0.156      -1.182      -2.958
10          10         -0.765      -0.006      -1.326  ...      -0.276       0.158      -0.917      -0.939
11          11         -0.833      -0.647      -2.124  ...      -0.978      -0.572      -1.158      -1.758
12          12         -0.767      -0.282      -1.615  ...      -0.444      -1.992      -0.898      -0.831
13          13         -0.356      -0.503      -1.035  ...      -0.397      -0.897      -0.844      -0.425
14          14         -0.496      -0.998      -1.356  ...      -0.669      -0.338      -1.145      -1.210
15          15         -0.684      -0.618      -1.060  ...      -0.805      -0.373      -1.040      -0.868
16          16         -0.953      -0.547      -1.437  ...      -0.504      -0.512      -0.898      -1.441
17          17         -0.869      -0.403      -1.360  ...      -0.636       0.065      -1.069      -0.929
18          18         -0.831      -0.691      -1.553  ...      -0.626      -0.489      -0.858      -1.033
...
47          47          0.730      -0.282      -1.019  ...      -0.511      -1.127      -0.792      -0.368
48          48         -0.181      -0.145      -0.585  ...      -0.939      -0.388      -1.062      -0.547
49          49         -0.144      -0.120      -0.496  ...      -0.856      -1.313      -1.161       0.150


> bs <- hanaml.BSTS(data = data,
                    key = "TIME_STAMP",
                    burn = 0.6, expected.model.size = 2, niter = 2000,
                    seasonal.period = 12, seed = 1)


> bs$stats
    DATA_NAME  INCLUSION_PROB  AVG_COEFF
0  FEATURE_08         0.48500   0.173861
1  FEATURE_01         0.40250   0.437837
2  FEATURE_07         0.24625   0.189362
3  FEATURE_09         0.23375   0.081339
4  FEATURE_02         0.19750   0.098693
5  FEATURE_04         0.14375   0.130138
6  FEATURE_06         0.14125   0.062544
7  FEATURE_10         0.10375   0.003327
8  FEATURE_03         0.08875   0.009415
9  FEATURE_05         0.08750   0.021849

Arguments

Value

Details

Examples

See also