CoxProportionalHazardModel¶
- class hana_ml.algorithms.pal.regression.CoxProportionalHazardModel(tie_method=None, status_col=None, max_iter=None, convergence_criterion=None, significance_level=None, calculate_hazard=None, output_fitted=None, type_kind=None, thread_ratio=0.0)¶
Cox proportional hazard model (CoxPHM) is a special generalized linear model. It is a well-known realization-of-survival model that demonstrates failure or death at a certain time.
- Parameters
- tie_method{'breslow', 'efron'}, optional
The method to deal with tied events.
Defaults to 'efron'.
- status_colbool, optional(deprecated)
If a status column is defined for right-censored data:
False : No status column. All response times are failure/death.
True : There is a status column, of which 0 indicates right-censored data, and 1 indicates failure/death.
Defaults to True.
Deprecated, please use parameter
status_colin the fit() method.- max_iterint, optional
Maximum number of iterations for numeric optimization.
- convergence_criterionfloat, optional
Convergence criterion of coefficients for numeric optimization.
Defaults to 0.
- significance_levelfloat, optional
Significance level for the confidence interval of estimated coefficients.
Defaults to 0.05.
- calculate_hazardbool, optional
Controls whether to calculate hazard function as well as survival function.
False : Does not calculate hazard function.
True: Calculates hazard function.
Defaults to True.
- output_fittedbool, optional
Controls whether to output the fitted response:
False : Does not output the fitted response.
True: Outputs the fitted response.
Defaults to False.
- type_kindstr, optional(deprecated)
The prediction type:
'risk': Predicts in risk space
'lp': Predicts in linear predictor space
Default Value is 'risk'
Deprecated, please use parameter
pred_typeof the predict() method.- thread_ratiofloat, optional(deprecated)
Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.
Defaults to 0.
- Attributes
- statistics_DataFrame
Regression-related statistics, such as r-square, log-likelihood, aic.
- coefficient_DataFrame
Fitted regression coefficients.
- covariance_varianceDataFrame
Co-Variance related data.
- hazard_DataFrame
Statistics related to Time, Hazard, Survival.
- fitted_DataFrame
Predicted dependent variable values for training data. Set to None if the training data has no row IDs.
Methods
fit(data[, key, features, label, status_col])Fit the model to the training dataset.
predict(data[, key, features, thread_ratio, ...])Predict dependent variable values based on fitted model.
score(data[, key, features, label])Returns the coefficient of determination R2 of the prediction.
Examples
>>> cox = CoxProportionalHazardModel(significance_level= 0.05, calculate_hazard='yes', type_kind='risk') >>> cox.fit(data=df_train, key='ID', features=['X1', 'X2'], label='TIME')
Perform predict(): >>> cox.predict(data=df_predict, key='ID',features=['X1', 'X2']).collect()
- fit(data, key=None, features=None, label=None, status_col=None)¶
Fit the model to the training dataset.
- Parameters
- dataDataFrame
Training data.
- keystr, optional
Name of the ID column.
Mandatory if
datais not indexed, or the index ofdatacontains multiple columns.Defaults to the single index column of
dataif not provided.- featuresa list of str, optional
Names of the feature columns(inclusive of covariates as well as status column).
If not provided, defaults to all non-key, non-label columns.
- labelstr, optional
Name of the dependent variable(indicating the time before a failure/death event occurs or data is right censored.)
Defaults to the last non-ID column. (This is not the PAL default.)
- status_colbool, optional
Specifies if a status column is defined for right-censored data.
False : No status column. All response times are failure/death.
True : There is a status column in
data, of which 0 indicates right-censored data and 1 indicates failure/death. The column should correspond to:the 1st column in
featuresiffeaturesis specified;the 1st non-key, non-label column in
dataiffeaturesis not specified.
Defaults to True.
- Returns
- A fitted object of class "CoxProportionalHazardModel".
- predict(data, key=None, features=None, thread_ratio=None, pred_type=None, significance_level=None)¶
Predict dependent variable values based on fitted model.
- Parameters
- dataDataFrame
Independent variable values used for prediction.
- keystr, optional
Name of the ID column.
Mandatory if
datais not indexed, or the index ofdatacontains multiple columns.Defaults to the single index column of
dataif not provided.- featuresa list of str, optional
Names of the covariates.
- thread_ratiofloat, optional(deprecated)
Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.
Defaults to 0.
- pred_type: str, optional
The prediction type:
'risk': Predicts in risk space
'lp': Predicts in linear predictor space
Default Value is 'risk'
- significance_levelfloat, optional
Significance level for the confidence interval and prediction interval.
Defaults to 0.05.
- Returns
- DataFrame
Predicted values, structured as follows:
ID column, with same name and type as
data's ID column.VALUE, type DOUBLE, representing predicted values.
- score(data, key=None, features=None, label=None)¶
Returns the coefficient of determination R2 of the prediction.
- Parameters
- dataDataFrame
Data on which to assess model performance.
- keystr, optional
Name of the ID column.
Mandatory if
datais not indexed, or the index ofdatacontains multiple columns.Defaults to the single index column of
dataif not provided.- featuresa list of str, optional
Names of the feature columns.
- labelstr, optional
Name of the dependent variable.
Defaults to the last non-ID column(this is not the PAL default).
- Returns
- float
The coefficient of determination R2 of the prediction on the given data.