GARCH
- class hana_ml.algorithms.pal.tsa.garch.GARCH(p=None, q=None, model_type=None)
Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) is a statistic model used to analysis variance of error (innovation or residual) term in time series. It is typically used in the analyzing of financial data, such as estimation of volatility of returns for stocks and bonds.
GARCH assumes variance of error term is heteroskedastic which means it is not a constant value. In appearance, it tends to cluster. GARCH assumes variance of error term subjects to an autoregressive moving average(ARMA) pattern, in other words it is an average of past values.
Assuming a time-series model:
\(y_t = \mu_t + \varepsilon_t\)
where \(\mu_t\) is called mean model(can be an ARMA model or just a constant value), it is \(\sigma_t^2 = var(\varepsilon_t|F_{t-1})\) (i.e. the conditional variance of \(\varepsilon_t\)) that causes the main interest, where \(F_{t-1}\) stands for the information set known at time t-1.
Then, a GARCH(p, q) model is defined as:
\(\sigma_t^2 = \alpha_0+\sum_{i=1}^p\alpha_i\varepsilon_{t-i}^2+\sum_{j=1}^q\beta_j\sigma_{t-j}^2\),
where \(\alpha_0 > 0\) and \(\alpha_i \geq 0, \beta_j\geq 0, i \in [1, p], j \in [1, q].\)
In our procedure, it is assumed that \(\mu_t\) has already been deducted from \(y_t\). So the input time-series is \(\varepsilon_t\) only.
Another assumption is \(P(\varepsilon_t | F_{t-1}) \sim N(0,\sigma_t^2)\), so model factors can be estimated with MLE.
- Parameters:
- pint, optional
Specifies the number of lagged error terms in GARCH model.
Valid only when
model_type
is not "igarch".Defaults to 1.
- qint, optional
Specifies the number of lagged variance terms in GARCH model.
Valid only when
model_type
is not "igarch".Defaults to 1.
- model_typestr, optional
Specifies the variant of GARCH model.
'garch' : the regular GARCH model.
'igarch' : the integrated GARCH model.
'tgarch' : the threshold GARCH model.
'egarch' : the exponential GARCH model.
Defaults to 'garch'.
Examples
>>> df.collect() TIME VAR1 VAR2 VAR3 0 1 2 0.17 A 1 2 2 0.19 A ... 18 19 2 1.32 A 19 20 2 1.10 A
Setting up hyper-parameters and train the GARCH model using the input data:
>>> gh = GARCH(p=1, q=1) >>> gh.fit(data=df, key='TIME', endog='VAR2') >>> gh.model_.collect() ROW_INDEX MODEL_CONTENT 0 0 {"garch":{"factors":[0.13309395260165602,1.060...
Predicting future volatility of the given time-series data:
>>> pred_res, _ = gh.predict(horizon=5) >>> pred_res.collect() STEP VARIANCE RESIDUAL 0 1 1.415806 None 1 2 1.633979 None 2 3 1.865262 None 3 4 2.110445 None 4 5 2.370360 None
- Attributes:
- model_DataFrame
Model content.
- variance_DataFrame
For storing the variance information of the training data, structured as follows:
1st column : Same name and type as the index(timestamp) column in the training data.
2nd column : VARIANCE, type DOUBLE, representing the conditional variance of residual term.
3rd column : RESIDUAL, type DOUBLE, representing the residual value.
set to None if GARCH model is not fitted.
- stats_DataFrame
DataFrame for storing the related statistics in fitting GARCH model.
1st column : STAT_NAME, type NVARCHAR(1000)
2nd column : STAT_VALUE, type NVARCHAR(1000)
Methods
fit
(data[, key, endog, thread_ratio])Fit the model to the training dataset.
Get the model metrics.
Get the score metrics.
predict
([horizon])Predicts variance of error terms in time series based on trained GARCH model.
- fit(data, key=None, endog=None, thread_ratio=None)
Fit the model to the training dataset.
- Parameters:
- dataDataFrame
Input data for fitting a GARCH model.
data
should at least contain 2 columns described as follows:An index column of INTEGER or TIMESTAMP/DATE/SECONDDATE type, representing the time-order(i.e. timestamp).
An numerical column representing the values of time-series.
- keystr, optional
Specifies the name of index column in
data
.Mandatory if
data
is not indexed, or indexed by multiple columns.Defaults to the single index column of
data
if not provided.- endogstr, optional
Specifies the name of the columns holding values for time-series in
data
.Cannot be the
key
column.Defaults to the last non-key column in
data
.- thread_ratiofloat, optional
Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use. Valid only when
show_explainer
is True.Defaults to -1.
- Returns:
- A fitted object of class "GARCH".
- predict(horizon=None)
Predicts variance of error terms in time series based on trained GARCH model.
- Parameters:
- dataDataFrame
Time-series data for predicting the variance of error terms, should contain at least 2 columns described as follows:
An index column of INTEGER/TIMESTAMP type, representing the time-order(i.e. timestamp).
An numerical column representing the values of time-series.
- horizonint, optional
Specifies the number of steps to be forecasted.
Defaults to 1.
- Returns:
- Two DataFrames
1st DataFrame : the variance information.
2nd oataFrame : statistics.
- get_model_metrics()
Get the model metrics.
- Returns:
- DataFrame
The model metrics.
- get_score_metrics()
Get the score metrics.
- Returns:
- DataFrame
The score metrics.
Inherited Methods from PALBase
Besides those methods mentioned above, the GARCH class also inherits methods from PALBase class, please refer to PAL Base for more details.