kaplan_meier_survival_analysis

hana_ml.algorithms.pal.stats.kaplan_meier_survival_analysis(data, event_indicator=None, conf_level=None)

The Kaplan-Meier estimator is a non-parametric statistic used to estimate the survival function from lifetime data. It is often used to measure the time-to-death of patients after treatment or time-to-failure of machine parts.

Parameters
dataDataFrame

DataFrame containing the data.

event_indicatorint, optional

Specifies one value to indicate an event has occurred.

Default to 1.

conf_levelfloat, optional

Specifies confidence level for a two-sided confidence interval on the survival estimate.

Default to 0.95.

Returns
DataFrame

Survival estimates, structured as follows:

  • GROUP, group.

  • TIME, event occurrence time. Survival estimates at all event times are output.

  • RISK_NUMBER, number at risk (total number of survivors at the beginning of each period).

  • EVENT_NUMBER, number of event occurrences.

  • PROBABILITY, probability of surviving beyond event occurrence time.

  • SE, standard error for the survivor estimate.

  • CI_LOWER, lower bound of confidence interval.

  • CI_UPPER, upper bound of confidence interval.

Log rank test statistics result 1, structured as follows:

  • GROUP, group.

  • TOTAL_RISK, all individuals in the lifetime study.

  • OBSERVED, observed event number.

  • EXPECTED, expected event number.

  • LOGRANK_STAT, log rank test statistics.

Log rank test statistics result 2, structured as follows:

  • STAT_NAME, name of statistics.

  • STAT_VALUE, value of statistics.

Examples

Original data:

>>> df.collect()
    TIME  STATUS  OCCURRENCES  GROUP
0      9       1            1      2
1     10       1            1      1
2      1       1            2      0
3     31       0            1      1
4      2       1            1      0
5     25       1            3      1
6    255       0            1      0
7     90       1            1      0
8     22       1            1      1
9    100       0            1      1
10    28       0            1      0
11     5       1            1      1
12     7       1            1      1
13    11       0            1      0
14    20       0            1      0
15    30       1            2      2
16   101       0            1      2
17     8       0            1      1

Perform the function:

>>> survival_estimates, res, stats = kaplan_meier_survival_analysis(data)
>>> res.collect()
  GROUP  TOTAL_RISK  OBSERVED  EXPECTED  LOGRANK_STAT
0     0           8         4  4.353652      0.045712
1     1          10         7  6.024638      0.307951
2     2           4         3  3.621710      0.161786