BSTS
- hana_ml.algorithms.pal.tsa.bsts.BSTS(burn=None, niter=None, seasonal_period=None, expected_model_size=None, seed=None)
class for Bayesian structure time-series(BSTS). Basically, let \(y_t\) denote the observed value at time t in a real-valued time-series, a generic structural time series model can be described by a pair of equations relating \(y_t\) to a vector of latent state variables \(\alpha_t\) as follows:
\(y_t = Z_t^T\alpha_t + \epsilon_t, \epsilon_t\sim N(0, H_t)\)
\(\alpha_t = T_t\alpha_t + R_t\eta_t, \eta_t \sim N(0, Q_t)\)
In this class, a special structural time-series model is considered, with system equation stated as follows:
\(y_t = \mu_t + \tau_t + \beta^T \bf{x}_t + \epsilon_t\),
\(\mu_t = \mu_{t-1} + \delta_t + u_t\),
\(\delta_t = \delta_{t-1} + v_t\),
\(\tau_t = -\sum_{s=1}^{S-1}\tau_{t-s} + w_t\),
where \(\mu_t, \delta_t, \tau_t\) and \(\beta^T\bf{x}_t\) are the trend, slope of trend, seasonal(with period S) and regression components w.r.t. contemporaneous data, respectively, \(\epsilon_t, u_t, v_t\) and \(w_t\) are independent Gaussian random variables.
BSTS can be seen as a combination of three Bayesian methods altogether - Kalman filter, spike-and-slab regression and Bayesian model averaging. In particular, samples of model parameters are drawn from its posterior distributions using MCMC.
- Parameters
- burnfloat, optional
Specifies the ratio of total MCMC draws that are neglected from the beginning. Ranging from 0 to 1. In other words, only the tail 1-
burn
portion of the total MCMC draw is kept(in the model) for prediction.Defaults to 0.5.
- niterint, optional
Specifies the total number of MCMC draws.
Defaults to 1000.
- seasonal_periodint, optional
Specifies the value of seasonal period.
Negative value : Period determined automatically
0 or 1 : Target time-series assumed non-seasonal
2 or larger : The specified value of seasonal period
Defaults to -1, i.e. determined automatically.
- expected_model_sizeint, optional
Specifies the number of contemporaneous data that are expected to be included in the model.
Defaults to half of the number of contemporaneous data columns.
Examples
>>> data.collect() TIME_STAMP TARGET_SERIES FEATURE_01 FEATURE_02 ... FEATURE_07 FEATURE_08 FEATURE_09 FEATURE_10 0 0 2.536 1.488 -0.561 ... 0.300 1.750 0.498 0.073 1 1 0.882 1.100 -0.992 ... 0.180 -0.011 0.264 0.584 2 2 -0.077 1.155 -1.212 ... 0.119 -0.028 0.031 0.448 3 3 0.135 0.530 -1.034 ... 0.727 -0.230 -0.143 -0.269 4 4 0.373 0.698 -1.195 ... 0.598 0.625 -0.219 -1.006 5 5 -0.437 0.441 -1.386 ... -0.199 -0.401 -0.526 -1.124 6 6 -0.556 0.405 -0.844 ... -0.245 -0.976 -0.699 -0.504 7 7 -0.432 -0.016 -1.001 ... -0.871 -1.236 -0.884 -1.254 8 8 -0.460 0.271 -1.234 ... -0.359 -0.555 -0.778 -2.114 9 9 -0.698 -0.357 -1.269 ... -1.116 0.156 -1.182 -2.958 10 10 -0.765 -0.006 -1.326 ... -0.276 0.158 -0.917 -0.939 11 11 -0.833 -0.647 -2.124 ... -0.978 -0.572 -1.158 -1.758 12 12 -0.767 -0.282 -1.615 ... -0.444 -1.992 -0.898 -0.831 13 13 -0.356 -0.503 -1.035 ... -0.397 -0.897 -0.844 -0.425 14 14 -0.496 -0.998 -1.356 ... -0.669 -0.338 -1.145 -1.210 15 15 -0.684 -0.618 -1.060 ... -0.805 -0.373 -1.040 -0.868 16 16 -0.953 -0.547 -1.437 ... -0.504 -0.512 -0.898 -1.441 17 17 -0.869 -0.403 -1.360 ... -0.636 0.065 -1.069 -0.929 18 18 -0.831 -0.691 -1.553 ... -0.626 -0.489 -0.858 -1.033 ... 47 47 0.730 -0.282 -1.019 ... -0.511 -1.127 -0.792 -0.368 48 48 -0.181 -0.145 -0.585 ... -0.939 -0.388 -1.062 -0.547 49 49 -0.144 -0.120 -0.496 ... -0.856 -1.313 -1.161 0.150
>>> bt = BSTS(burn=0.6, expected_model_size=2, niter=2000, seasonal_period=12, seed=1) >>> bt.fit(data=data, key='TIME_STAMP') >>> bt.stats_.collect() >>> bt.stats_.collect() DATA_NAME INCLUSION_PROB AVG_COEFF 0 FEATURE_08 0.48500 0.173861 1 FEATURE_01 0.40250 0.437837 2 FEATURE_07 0.24625 0.189362 3 FEATURE_09 0.23375 0.081339 4 FEATURE_02 0.19750 0.098693 5 FEATURE_04 0.14375 0.130138 6 FEATURE_06 0.14125 0.062544 7 FEATURE_10 0.10375 0.003327 8 FEATURE_03 0.08875 0.009415 9 FEATURE_05 0.08750 0.021849
>>> data_pred.collect() TIME_STAMP FEATURE_01 FEATURE_02 FEATURE_03 ... FEATURE_07 FEATURE_08 FEATURE_09 FEATURE_10 0 50 0.471 -0.660 -0.086 ... -1.107 -0.559 -1.404 -1.646 1 51 0.872 0.062 0.481 ... -0.729 0.894 -0.754 1.107 2 52 0.976 -0.003 0.824 ... -0.589 0.133 0.007 -0.115 3 53 0.446 0.231 0.098 ... -0.014 0.182 -0.465 -1.062 4 54 0.248 -0.142 0.174 ... -0.380 1.236 -0.552 -1.051 5 55 -0.319 -0.867 0.334 ... -0.160 -0.488 -0.650 -0.769 6 56 -0.194 -0.822 0.523 ... -0.566 -0.289 -0.596 -0.559 7 57 -0.357 -0.564 -0.391 ... -0.980 0.578 -0.948 -0.870 8 58 -0.760 -1.113 -0.178 ... -0.477 -0.705 -1.199 -0.517 9 59 -0.611 -1.163 0.186 ... -0.976 -0.576 -0.927 -1.577 >>> forecast_, _ = bt.predict(data_pred, key='TIME_STAMP') >>> forecast_.collect() TIME_STAMP FORECAST SE LOWER_80 UPPER_80 LOWER_95 UPPER_95 0 50 0.143151 0.591231 -0.614542 0.900844 -1.015640 1.301943 1 51 0.469405 0.765558 -0.511697 1.450508 -1.031060 1.969871 2 52 0.155813 1.004786 -1.131872 1.443499 -1.813531 2.125158 3 53 0.055188 1.160655 -1.432251 1.542627 -2.219653 2.330029 4 54 0.064481 1.385078 -1.710569 1.839531 -2.650222 2.779185 5 55 0.045844 1.660894 -2.082678 2.174365 -3.209448 3.301135 6 56 -0.039227 1.905115 -2.480732 2.402277 -3.773185 3.694731 7 57 0.124084 2.193157 -2.686560 2.934728 -4.174424 4.422592 8 58 -0.200588 2.479858 -3.378655 2.977478 -5.061020 4.659843 9 59 0.339182 2.763764 -3.202725 3.881089 -5.077696 5.756059
- Attributes
- stats_DataFrame
Related statistics on the inclusion of contemporaneous data w.r.t. the target time-series, structured as follows:
1st column : DATA_NAME, type NVARCHAR or NVARCHAR, indicating the (column) name of contemporaneous data.
2nd column : INCLUSION_PROB, type DOUBLE, indicating the inclusion probability of each contemporaneous data column.
3rd column : AVG_COEFF, type DOUBLE, indicating the average value of coefficients for each contemporaneous data column if included in the model.
- decompose_DataFrame
Decomposed components of the target time-series, structured as follows:
1st column : TIME_STAMP, type INTEGER, representing the order of time-series and is sorted ascendingly.
2nd column : TREND, type DOUBLE, representing the trend component.
3rd column : SEASONAL, type DOUBLE, representing the seasonal component.
4th column : REGRESSION, type DOUBLE, representing the regression component w.r.t. contemporaneous data.
5th column : RANDOM, type DOUBLE, representing the random component.
- model_DataFrame
DataFrame containing the retained tail MCMC samples in a JSON string, structured as follows:
1st column : ROW_INDEX, type INTEGER, indicating the ID of current row.
2nd column : MODEL_CONTENT, type NVARCHAR, JSON string.