kaplan_meier_survival_analysis

hana_ml.algorithms.pal.stats.kaplan_meier_survival_analysis(data, event_indicator=None, conf_level=None)

The Kaplan-Meier estimator is a non-parametric statistic used to estimate the survival function from lifetime data. It is often used to measure the time-to-death of patients after treatment or time-to-failure of machine parts.

Parameters:

dataDataFrame

DataFrame containing the data.

event_indicatorint, optional

Specifies one value to indicate an event has occurred.

Default to 1.

conf_levelfloat, optional

Specifies confidence level for a two-sided confidence interval on the survival estimate.

Default to 0.95.

Returns:

DataFrame

Survival estimates, structured as follows:

GROUP, group.

TIME, event occurrence time. Survival estimates at all event times are output.

RISK_NUMBER, number at risk (total number of survivors at the beginning of each period).

EVENT_NUMBER, number of event occurrences.

PROBABILITY, probability of surviving beyond event occurrence time.

SE, standard error for the survivor estimate.

CI_LOWER, lower bound of confidence interval.

CI_UPPER, upper bound of confidence interval.

Log rank test statistics result 1, structured as follows:

GROUP, group.

TOTAL_RISK, all individuals in the lifetime study.

OBSERVED, observed event number.

EXPECTED, expected event number.

LOGRANK_STAT, log rank test statistics.

Log rank test statistics result 2, structured as follows:

STAT_NAME, name of statistics.

STAT_VALUE, value of statistics.

Examples

Original data:

>>> df.collect()
    TIME  STATUS  OCCURRENCES  GROUP
    9       1            1      2
   10       1            1      1
    1       1            2      0
   31       0            1      1
    2       1            1      0
   25       1            3      1
  255       0            1      0
   90       1            1      0
   22       1            1      1
  100       0            1      1
  28       0            1      0
   5       1            1      1
   7       1            1      1
  11       0            1      0
  20       0            1      0
  30       1            2      2
 101       0            1      2
   8       0            1      1

Perform the function:

>>> survival_estimates, res, stats = kaplan_meier_survival_analysis(data)
>>> res.collect()
  GROUP  TOTAL_RISK  OBSERVED  EXPECTED  LOGRANK_STAT
0     0           8         4  4.353652      0.045712
1     1          10         7  6.024638      0.307951
2     2           4         3  3.621710      0.161786