CoxProportionalHazardModel
- class hana_ml.algorithms.pal.regression.CoxProportionalHazardModel(tie_method=None, status_col=None, max_iter=None, convergence_criterion=None, significance_level=None, calculate_hazard=None, output_fitted=None, type_kind=None, thread_ratio=0.0)
Cox proportional hazard model (CoxPHM) is a special generalized linear model. It is a well-known realization-of-survival model that demonstrates failure or death at a certain time.
- Parameters:
- tie_method{'breslow', 'efron'}, optional
The method to deal with tied events.
Defaults to 'efron'.
- status_colbool, optional(deprecated)
If a status column is defined for right-censored data:
False : No status column. All response times are failure/death.
True : There is a status column, of which 0 indicates right-censored data, and 1 indicates failure/death.
Defaults to True.
Deprecated, please use parameter
status_col
in the fit() method.- max_iterint, optional
Maximum number of iterations for numeric optimization.
- convergence_criterionfloat, optional
Convergence criterion of coefficients for numeric optimization.
Defaults to 0.
- significance_levelfloat, optional
Significance level for the confidence interval of estimated coefficients.
Defaults to 0.05.
- calculate_hazardbool, optional
Controls whether to calculate hazard function as well as survival function.
False : Does not calculate hazard function.
True: Calculates hazard function.
Defaults to True.
- output_fittedbool, optional
Controls whether to output the fitted response:
False : Does not output the fitted response.
True: Outputs the fitted response.
Defaults to False.
- type_kindstr, optional(deprecated)
The prediction type:
'risk': Predicts in risk space
'lp': Predicts in linear predictor space
Default Value is 'risk'
Deprecated, please use parameter
pred_type
of the predict() method.- thread_ratiofloat, optional(deprecated)
Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.
Defaults to 0.
Examples
>>> cox = CoxProportionalHazardModel(significance_level= 0.05, calculate_hazard='yes', type_kind='risk') >>> cox.fit(data=df_train, key='ID', features=['X1', 'X2'], label='TIME')
Perform predict(): >>> cox.predict(data=df_predict, key='ID',features=['X1', 'X2']).collect()
- Attributes:
- statistics_DataFrame
Regression-related statistics, such as r-square, log-likelihood, aic.
- coefficient_DataFrame
Fitted regression coefficients.
- covariance_varianceDataFrame
Co-Variance related data.
- hazard_DataFrame
Statistics related to Time, Hazard, Survival.
- fitted_DataFrame
Predicted dependent variable values for training data. Set to None if the training data has no row IDs.
Methods
fit
(data[, key, features, label, status_col])Fit the model to the training dataset.
predict
(data[, key, features, thread_ratio, ...])Predict dependent variable values based on fitted model.
score
(data[, key, features, label])Returns the coefficient of determination R2 of the prediction.
- fit(data, key=None, features=None, label=None, status_col=None)
Fit the model to the training dataset.
- Parameters:
- dataDataFrame
Training data.
- keystr, optional
Name of the ID column.
Mandatory if
data
is not indexed, or the index ofdata
contains multiple columns.Defaults to the single index column of
data
if not provided.- featuresa list of str, optional
Names of the feature columns(inclusive of covariates as well as status column).
If not provided, defaults to all non-key, non-label columns.
- labelstr, optional
Name of the dependent variable(indicating the time before a failure/death event occurs or data is right censored.)
Defaults to the last non-ID column. (This is not the PAL default.)
- status_colbool, optional
Specifies if a status column is defined for right-censored data.
False : No status column. All response times are failure/death.
True : There is a status column in
data
, of which 0 indicates right-censored data and 1 indicates failure/death. The column should correspond to:the 1st column in
features
iffeatures
is specified;the 1st non-key, non-label column in
data
iffeatures
is not specified.
Defaults to True.
- Returns:
- A fitted object of class "CoxProportionalHazardModel".
- predict(data, key=None, features=None, thread_ratio=None, pred_type=None, significance_level=None)
Predict dependent variable values based on fitted model.
- Parameters:
- dataDataFrame
Independent variable values used for prediction.
- keystr, optional
Name of the ID column.
Mandatory if
data
is not indexed, or the index ofdata
contains multiple columns.Defaults to the single index column of
data
if not provided.- featuresa list of str, optional
Names of the covariates.
- thread_ratiofloat, optional(deprecated)
Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.
Defaults to 0.
- pred_type: str, optional
The prediction type:
'risk': Predicts in risk space
'lp': Predicts in linear predictor space
Default Value is 'risk'
- significance_levelfloat, optional
Significance level for the confidence interval and prediction interval.
Defaults to 0.05.
- Returns:
- DataFrame
Predicted values, structured as follows:
ID column, with same name and type as
data
's ID column.VALUE, type DOUBLE, representing predicted values.
- score(data, key=None, features=None, label=None)
Returns the coefficient of determination R2 of the prediction.
- Parameters:
- dataDataFrame
Data on which to assess model performance.
- keystr, optional
Name of the ID column.
Mandatory if
data
is not indexed, or the index ofdata
contains multiple columns.Defaults to the single index column of
data
if not provided.- featuresa list of str, optional
Names of the feature columns.
- labelstr, optional
Name of the dependent variable.
Defaults to the last non-ID column(this is not the PAL default).
- Returns:
- float
The coefficient of determination R2 of the prediction on the given data.
Inherited Methods from PALBase
Besides those methods mentioned above, the CoxProportionalHazardModel class also inherits methods from PALBase class, please refer to PAL Base for more details.