GARCH

class hana_ml.algorithms.pal.tsa.garch.GARCH(p=None, q=None, model_type=None)

Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) is a statistic model used to analysis variance of error (innovation or residual) term in time series. It is typically used in the analyzing of financial data, such as estimation of volatility of returns for stocks and bonds.

GARCH assumes variance of error term is heteroskedastic which means it is not a constant value. In appearance, it tends to cluster. GARCH assumes variance of error term subjects to an autoregressive moving average(ARMA) pattern, in other words it is an average of past values.

Assuming a time-series model:

\(y_t = \mu_t + \varepsilon_t\)

where \(\mu_t\) is called mean model(can be an ARMA model or just a constant value), it is \(\sigma_t^2 = var(\varepsilon_t|F_{t-1})\) (i.e. the conditional variance of \(\varepsilon_t\)) that causes the main interest, where \(F_{t-1}\) stands for the information set known at time t-1.

Then, a GARCH(p, q) model is defined as:

\(\sigma_t^2 = \alpha_0+\sum_{i=1}^p\alpha_i\varepsilon_{t-i}^2+\sum_{j=1}^q\beta_j\sigma_{t-j}^2\),

where \(\alpha_0 > 0\) and \(\alpha_i \geq 0, \beta_j\geq 0, i \in [1, p], j \in [1, q].\)

In our procedure, it is assumed that \(\mu_t\) has already been deducted from \(y_t\). So the input time-series is \(\varepsilon_t\) only.

Another assumption is \(P(\varepsilon_t | F_{t-1}) \sim N(0,\sigma_t^2)\), so model factors can be estimated with MLE.

Parameters:

pint, optional

Specifies the number of lagged error terms in GARCH model.

Valid only when model_type is not "igarch".

Defaults to 1.

qint, optional

Specifies the number of lagged variance terms in GARCH model.

Valid only when model_type is not "igarch".

Defaults to 1.

model_typestr, optional

Specifies the variant of GARCH model.

'garch' : the regular GARCH model.
'igarch' : the integrated GARCH model.
'tgarch' : the threshold GARCH model.
'egarch' : the exponential GARCH model.

Defaults to 'garch'.

Examples

>>> df.collect()
    TIME  VAR1  VAR2 VAR3
0      1     2  0.17    A
1      2     2  0.19    A
...
18    19     2  1.32    A
19    20     2  1.10    A

Setting up hyper-parameters and train the GARCH model using the input data:

>>> gh = GARCH(p=1, q=1)
>>> gh.fit(data=df, key='TIME', endog='VAR2')
>>> gh.model_.collect()
   ROW_INDEX                                      MODEL_CONTENT
0          0  {"garch":{"factors":[0.13309395260165602,1.060...

Predicting future volatility of the given time-series data:

>>> pred_res, _ = gh.predict(horizon=5)
>>> pred_res.collect()
   STEP  VARIANCE RESIDUAL
0     1  1.415806     None
1     2  1.633979     None
2     3  1.865262     None
3     4  2.110445     None
4     5  2.370360     None

Attributes:

model_DataFrame

Model content.

variance_DataFrame

For storing the variance information of the training data, structured as follows:

1st column : Same name and type as the index(timestamp) column in the training data.

2nd column : VARIANCE, type DOUBLE, representing the conditional variance of residual term.

3rd column : RESIDUAL, type DOUBLE, representing the residual value.

set to None if GARCH model is not fitted.

stats_DataFrame

DataFrame for storing the related statistics in fitting GARCH model.

1st column : STAT_NAME, type NVARCHAR(1000)

2nd column : STAT_VALUE, type NVARCHAR(1000)

Methods

`fit`(data[, key, endog, thread_ratio])	Fit the model to the training dataset.
`get_model_metrics`()	Get the model metrics.
`get_score_metrics`()	Get the score metrics.
`predict`([horizon])	Predicts variance of error terms in time series based on trained GARCH model.

fit(data, key=None, endog=None, thread_ratio=None)

Fit the model to the training dataset.

Parameters:

dataDataFrame

Input data for fitting a GARCH model.

data should at least contain 2 columns described as follows:

An index column of INTEGER or TIMESTAMP/DATE/SECONDDATE type, representing the time-order(i.e. timestamp).

An numerical column representing the values of time-series.

keystr, optional

Specifies the name of index column in data.

Mandatory if data is not indexed, or indexed by multiple columns.

Defaults to the single index column of data if not provided.

endogstr, optional

Specifies the name of the columns holding values for time-series in data.

Cannot be the key column.

Defaults to the last non-key column in data.

thread_ratiofloat, optional

Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use. Valid only when show_explainer is True.

Defaults to -1.

Returns:

A fitted object of class "GARCH".

predict(horizon=None)

Predicts variance of error terms in time series based on trained GARCH model.

Parameters:

dataDataFrame

Time-series data for predicting the variance of error terms, should contain at least 2 columns described as follows:

An index column of INTEGER/TIMESTAMP type, representing the time-order(i.e. timestamp).
An numerical column representing the values of time-series.

horizonint, optional

Specifies the number of steps to be forecasted.

Defaults to 1.

Returns:

Two DataFrames

1st DataFrame : the variance information.
2nd oataFrame : statistics.

get_model_metrics()

Get the model metrics.

Returns:

DataFrame: The model metrics.

get_score_metrics()

Get the score metrics.

Returns:

DataFrame: The score metrics.

Inherited Methods from PALBase

Besides those methods mentioned above, the GARCH class also inherits methods from PALBase class, please refer to PAL Base for more details.