BSTS

hana_ml.algorithms.pal.tsa.bsts.BSTS(burn=None, niter=None, seasonal_period=None, expected_model_size=None, seed=None)

class for Bayesian structure time-series(BSTS). Basically, let \(y_t\) denote the observed value at time t in a real-valued time-series, a generic structural time series model can be described by a pair of equations relating \(y_t\) to a vector of latent state variables \(\alpha_t\) as follows:

  • \(y_t = Z_t^T\alpha_t + \epsilon_t, \epsilon_t\sim N(0, H_t)\)

  • \(\alpha_t = T_t\alpha_t + R_t\eta_t, \eta_t \sim N(0, Q_t)\)

In this class, a special structural time-series model is considered, with system equation stated as follows:

  • \(y_t = \mu_t + \tau_t + \beta^T \bf{x}_t + \epsilon_t\),

  • \(\mu_t = \mu_{t-1} + \delta_t + u_t\),

  • \(\delta_t = \delta_{t-1} + v_t\),

  • \(\tau_t = -\sum_{s=1}^{S-1}\tau_{t-s} + w_t\),

where \(\mu_t, \delta_t, \tau_t\) and \(\beta^T\bf{x}_t\) are the trend, slope of trend, seasonal(with period S) and regression components w.r.t. contemporaneous data, respectively, \(\epsilon_t, u_t, v_t\) and \(w_t\) are independent Gaussian random variables.

BSTS can be seen as a combination of three Bayesian methods altogether - Kalman filter, spike-and-slab regression and Bayesian model averaging. In particular, samples of model parameters are drawn from its posterior distributions using MCMC.

Parameters
burnfloat, optional

Specifies the ratio of total MCMC draws that are neglected from the beginning. Ranging from 0 to 1. In other words, only the tail 1-burn portion of the total MCMC draw is kept(in the model) for prediction.

Defaults to 0.5.

niterint, optional

Specifies the total number of MCMC draws.

Defaults to 1000.

seasonal_periodint, optional

Specifies the value of seasonal period.

  • Negative value : Period determined automatically

  • 0 or 1 : Target time-series assumed non-seasonal

  • 2 or larger : The specified value of seasonal period

Defaults to -1, i.e. determined automatically.

expected_model_sizeint, optional

Specifies the number of contemporaneous data that are expected to be included in the model.

Defaults to half of the number of contemporaneous data columns.

Examples

>>> data.collect()
    TIME_STAMP  TARGET_SERIES  FEATURE_01  FEATURE_02  ...  FEATURE_07  FEATURE_08  FEATURE_09  FEATURE_10
0            0          2.536       1.488      -0.561  ...       0.300       1.750       0.498       0.073
1            1          0.882       1.100      -0.992  ...       0.180      -0.011       0.264       0.584
2            2         -0.077       1.155      -1.212  ...       0.119      -0.028       0.031       0.448
3            3          0.135       0.530      -1.034  ...       0.727      -0.230      -0.143      -0.269
4            4          0.373       0.698      -1.195  ...       0.598       0.625      -0.219      -1.006
5            5         -0.437       0.441      -1.386  ...      -0.199      -0.401      -0.526      -1.124
6            6         -0.556       0.405      -0.844  ...      -0.245      -0.976      -0.699      -0.504
7            7         -0.432      -0.016      -1.001  ...      -0.871      -1.236      -0.884      -1.254
8            8         -0.460       0.271      -1.234  ...      -0.359      -0.555      -0.778      -2.114
9            9         -0.698      -0.357      -1.269  ...      -1.116       0.156      -1.182      -2.958
10          10         -0.765      -0.006      -1.326  ...      -0.276       0.158      -0.917      -0.939
11          11         -0.833      -0.647      -2.124  ...      -0.978      -0.572      -1.158      -1.758
12          12         -0.767      -0.282      -1.615  ...      -0.444      -1.992      -0.898      -0.831
13          13         -0.356      -0.503      -1.035  ...      -0.397      -0.897      -0.844      -0.425
14          14         -0.496      -0.998      -1.356  ...      -0.669      -0.338      -1.145      -1.210
15          15         -0.684      -0.618      -1.060  ...      -0.805      -0.373      -1.040      -0.868
16          16         -0.953      -0.547      -1.437  ...      -0.504      -0.512      -0.898      -1.441
17          17         -0.869      -0.403      -1.360  ...      -0.636       0.065      -1.069      -0.929
18          18         -0.831      -0.691      -1.553  ...      -0.626      -0.489      -0.858      -1.033
...
47          47          0.730      -0.282      -1.019  ...      -0.511      -1.127      -0.792      -0.368
48          48         -0.181      -0.145      -0.585  ...      -0.939      -0.388      -1.062      -0.547
49          49         -0.144      -0.120      -0.496  ...      -0.856      -1.313      -1.161       0.150
>>> bt = BSTS(burn=0.6, expected_model_size=2, niter=2000, seasonal_period=12, seed=1)
>>> bt.fit(data=data, key='TIME_STAMP')
>>> bt.stats_.collect()
>>> bt.stats_.collect()
    DATA_NAME  INCLUSION_PROB  AVG_COEFF
0  FEATURE_08         0.48500   0.173861
1  FEATURE_01         0.40250   0.437837
2  FEATURE_07         0.24625   0.189362
3  FEATURE_09         0.23375   0.081339
4  FEATURE_02         0.19750   0.098693
5  FEATURE_04         0.14375   0.130138
6  FEATURE_06         0.14125   0.062544
7  FEATURE_10         0.10375   0.003327
8  FEATURE_03         0.08875   0.009415
9  FEATURE_05         0.08750   0.021849
>>> data_pred.collect()
   TIME_STAMP  FEATURE_01  FEATURE_02  FEATURE_03  ...  FEATURE_07  FEATURE_08  FEATURE_09  FEATURE_10
0          50       0.471      -0.660      -0.086  ...      -1.107      -0.559      -1.404      -1.646
1          51       0.872       0.062       0.481  ...      -0.729       0.894      -0.754       1.107
2          52       0.976      -0.003       0.824  ...      -0.589       0.133       0.007      -0.115
3          53       0.446       0.231       0.098  ...      -0.014       0.182      -0.465      -1.062
4          54       0.248      -0.142       0.174  ...      -0.380       1.236      -0.552      -1.051
5          55      -0.319      -0.867       0.334  ...      -0.160      -0.488      -0.650      -0.769
6          56      -0.194      -0.822       0.523  ...      -0.566      -0.289      -0.596      -0.559
7          57      -0.357      -0.564      -0.391  ...      -0.980       0.578      -0.948      -0.870
8          58      -0.760      -1.113      -0.178  ...      -0.477      -0.705      -1.199      -0.517
9          59      -0.611      -1.163       0.186  ...      -0.976      -0.576      -0.927      -1.577
>>> forecast_, _ = bt.predict(data_pred, key='TIME_STAMP')
>>> forecast_.collect()
   TIME_STAMP  FORECAST        SE  LOWER_80  UPPER_80  LOWER_95  UPPER_95
0          50  0.143151  0.591231 -0.614542  0.900844 -1.015640  1.301943
1          51  0.469405  0.765558 -0.511697  1.450508 -1.031060  1.969871
2          52  0.155813  1.004786 -1.131872  1.443499 -1.813531  2.125158
3          53  0.055188  1.160655 -1.432251  1.542627 -2.219653  2.330029
4          54  0.064481  1.385078 -1.710569  1.839531 -2.650222  2.779185
5          55  0.045844  1.660894 -2.082678  2.174365 -3.209448  3.301135
6          56 -0.039227  1.905115 -2.480732  2.402277 -3.773185  3.694731
7          57  0.124084  2.193157 -2.686560  2.934728 -4.174424  4.422592
8          58 -0.200588  2.479858 -3.378655  2.977478 -5.061020  4.659843
9          59  0.339182  2.763764 -3.202725  3.881089 -5.077696  5.756059
Attributes
stats_DataFrame

Related statistics on the inclusion of contemporaneous data w.r.t. the target time-series, structured as follows:

  • 1st column : DATA_NAME, type NVARCHAR or NVARCHAR, indicating the (column) name of contemporaneous data.

  • 2nd column : INCLUSION_PROB, type DOUBLE, indicating the inclusion probability of each contemporaneous data column.

  • 3rd column : AVG_COEFF, type DOUBLE, indicating the average value of coefficients for each contemporaneous data column if included in the model.

decompose_DataFrame

Decomposed components of the target time-series, structured as follows:

  • 1st column : TIME_STAMP, type INTEGER, representing the order of time-series and is sorted ascendingly.

  • 2nd column : TREND, type DOUBLE, representing the trend component.

  • 3rd column : SEASONAL, type DOUBLE, representing the seasonal component.

  • 4th column : REGRESSION, type DOUBLE, representing the regression component w.r.t. contemporaneous data.

  • 5th column : RANDOM, type DOUBLE, representing the random component.

model_DataFrame

DataFrame containing the retained tail MCMC samples in a JSON string, structured as follows:

  • 1st column : ROW_INDEX, type INTEGER, indicating the ID of current row.

  • 2nd column : MODEL_CONTENT, type NVARCHAR, JSON string.