kaplan_meier_survival_analysis
- hana_ml.algorithms.pal.stats.kaplan_meier_survival_analysis(data, event_indicator=None, conf_level=None)
The Kaplan-Meier estimator is a non-parametric statistic used to estimate the survival function from lifetime data. It is often used to measure the time-to-death of patients after treatment or time-to-failure of machine parts.
- Parameters:
- dataDataFrame
DataFrame containing the data.
- event_indicatorint, optional
Specifies one value to indicate an event has occurred.
Default to 1.
- conf_levelfloat, optional
Specifies confidence level for a two-sided confidence interval on the survival estimate.
Default to 0.95.
- Returns:
- DataFrames
DataFrame 1 : Survival estimates, structured as follows:
GROUP, group.
TIME, event occurrence time. Survival estimates at all event times are output.
RISK_NUMBER, number at risk (total number of survivors at the beginning of each period).
EVENT_NUMBER, number of event occurrences.
PROBABILITY, probability of surviving beyond event occurrence time.
SE, standard error for the survivor estimate.
CI_LOWER, lower bound of confidence interval.
CI_UPPER, upper bound of confidence interval.
DataFrame 2 : Log rank test statistics result 1, structured as follows:
GROUP, group.
TOTAL_RISK, all individuals in the lifetime study.
OBSERVED, observed event number.
EXPECTED, expected event number.
LOGRANK_STAT, log rank test statistics.
DataFrame 3 : Log rank test statistics result 2, structured as follows:
STAT_NAME, name of statistics.
STAT_VALUE, value of statistics.
Examples
>>> df.collect() TIME STATUS OCCURRENCES GROUP 0 9 1 1 2 1 10 1 1 1 ... 17 8 0 1 1
>>> survival_estimates, res, stats = kaplan_meier_survival_analysis(data=df) >>> res.collect() GROUP TOTAL_RISK OBSERVED EXPECTED LOGRANK_STAT 0 0 8 4 4.353652 0.045712 1 1 10 7 6.024638 0.307951 2 2 4 3 3.621710 0.161786