kaplan_meier_survival_analysis
- hana_ml.algorithms.pal.stats.kaplan_meier_survival_analysis(data, event_indicator=None, conf_level=None)
The Kaplan-Meier estimator is a non-parametric statistic used to estimate the survival function from lifetime data. It is often used to measure the time-to-death of patients after treatment or time-to-failure of machine parts.
- Parameters
- dataDataFrame
DataFrame containing the data.
- event_indicatorint, optional
Specifies one value to indicate an event has occurred.
Default to 1.
- conf_levelfloat, optional
Specifies confidence level for a two-sided confidence interval on the survival estimate.
Default to 0.95.
- Returns
- DataFrame
Survival estimates, structured as follows:
GROUP, group.
TIME, event occurrence time. Survival estimates at all event times are output.
RISK_NUMBER, number at risk (total number of survivors at the beginning of each period).
EVENT_NUMBER, number of event occurrences.
PROBABILITY, probability of surviving beyond event occurrence time.
SE, standard error for the survivor estimate.
CI_LOWER, lower bound of confidence interval.
CI_UPPER, upper bound of confidence interval.
Log rank test statistics result 1, structured as follows:
GROUP, group.
TOTAL_RISK, all individuals in the lifetime study.
OBSERVED, observed event number.
EXPECTED, expected event number.
LOGRANK_STAT, log rank test statistics.
Log rank test statistics result 2, structured as follows:
STAT_NAME, name of statistics.
STAT_VALUE, value of statistics.
Examples
Original data:
>>> df.collect() TIME STATUS OCCURRENCES GROUP 0 9 1 1 2 1 10 1 1 1 2 1 1 2 0 3 31 0 1 1 4 2 1 1 0 5 25 1 3 1 6 255 0 1 0 7 90 1 1 0 8 22 1 1 1 9 100 0 1 1 10 28 0 1 0 11 5 1 1 1 12 7 1 1 1 13 11 0 1 0 14 20 0 1 0 15 30 1 2 2 16 101 0 1 2 17 8 0 1 1
Perform the function:
>>> survival_estimates, res, stats = kaplan_meier_survival_analysis(data) >>> res.collect() GROUP TOTAL_RISK OBSERVED EXPECTED LOGRANK_STAT 0 0 8 4 4.353652 0.045712 1 1 10 7 6.024638 0.307951 2 2 4 3 3.621710 0.161786