27  Kaplan-Meier

If we assume that every subject follows the same survival function (no covariates or other individual differences), we can easily estimate the survival function \(S(t)\) non-parametrically using the Kaplan-Meier (or product-limit) method.

27.1 Kaplan-Meier survival curve

PDF Example calculation

The Kaplan-Meier estimator is the product over the failure times of the conditional probabilities of surviving to the next failure time:

\[\hat S(t) = \prod_{t_i \leq t}(1-\hat q_i) = \prod_{t_i \leq t}\left(1 - \frac{d_i}{n_i} \right)\]

  • \(n_i\) is the number of subjects at risk at time \(t_i\)
  • \(d_i\) is the number of individuals who fail at that time
  • \(\hat q_i\) failure probability \(\hat q_i=d_i/n_i\)

\[\hat S(t_{i}) = \hat S(t_{i-1}) \times \hat P(T > t_{i} | T \geq t_{i}) = \prod_{j=1}^i \hat P(T > t_{j} | T \geq t_{j})\]

The median survival time is the smallest \(t\) such that the survival function is less than or equal to 0.5: \(\hat t_{med} = \mbox{inf}\left\{t:\hat S(t) \leq 0.5 \right\}\).

95% confidence interval for Kaplan-Meier: \[\hat S(t) \pm 1.96 \sqrt{\hat{Var}(\hat S(t))}\]

Greenwood’s variance formula: \[\hat{Var}\left(\hat S(t) \right) \approx [\hat S(t)]^2 \sum_{t_i \leq t}\frac{d_j}{n_j(n_j-d_j)}\]

Example

library(survival)
# data
tt <- c(7, 6, 6, 5, 2, 4)
cens <- c(0, 1, 0, 0, 1, 1) # 0 censored, 1 failed
# Surv() produces a special structure for censored survival data
Surv(tt, cens)
[1] 7+ 6  6+ 5+ 2  4 
# Survival function
res <- survfit(Surv(tt, cens) ~ 1)
plot(res)

Kaplan-Meier survival curve proof

\(A = T \geq t(f)\)

\(B = T > t(f)\)

\(A \mbox{ and } B = B\)

\(P(A \mbox{ and } B) = P(B) = P(T > t(f)) = S(t(f))\)

\(t(f)\) is the next failure time after \(t(f-1)\) so there cannot be failures after \(t(f-1)\) and before time \(t(f)\). Since there are not failures during \(t(f-1) < T < t(f)\), then \(A = T \geq t(f)\) is \(A = T > t(f-1)\)

\(P(A) = P(T \geq t(f)) = P(T > t(f-1)) = S(t(f-1))\)

\(P(B|A)= P(T > t(f) | T \geq t(f))\)

\(P(A \mbox{ and } B) = P(A) \times P(B|A)\)

\(S(t(f)) = S(t(f-1)) \times P(T > t(f)| T \geq t(f))\)

27.2 Example

lung cancer data in the survival package.

  • inst: Institution code
  • time: Survival time in days
  • status: censoring status 1=censored, 2=dead
  • age: Age in years
  • sex: Male=1 Female=2
  • ph.ecog: ECOG performance score as rated by the physician. 0=asymptomatic, 1= symptomatic but completely ambulatory, 2= in bed <50% of the day, 3= in bed > 50% of the day but not bedbound, 4 = bedbound ph.karno: Karnofsky performance score (bad=0-good=100) rated by physician
  • pat.karno: Karnofsky performance score as rated by patient
  • meal.cal: Calories consumed at meals
  • wt.loss: Weight loss in last six months
library(survival)
data("lung")
Warning in data("lung"): data set 'lung' not found
head(lung)
  inst time status age sex ph.ecog ph.karno pat.karno meal.cal wt.loss
1    3  306      2  74   1       1       90       100     1175      NA
2    3  455      2  68   1       0       90        90     1225      15
3    3 1010      1  56   1       0       90        90       NA      15
4    5  210      2  57   1       1       90        60     1150      11
5    1  883      2  60   1       0      100        90       NA       0
6   12 1022      1  74   1       1       50        80      513       0

Computing survival curves with the survfit() function of the survival package.

fit <- survfit(Surv(time, status) ~ sex, data = lung)

Visualizing survival curves with the ggsurvplot() function of the survminer package.

library(survminer)
# Change color, linetype by strata, risk.table color by strata
ggsurvplot(fit,
          pval = TRUE, conf.int = TRUE,
          risk.table = TRUE, # Add risk table
          risk.table.col = "strata", # Change risk table color by groups
          linetype = "strata", # Change line type by groups
          surv.median.line = "hv", # Specify median survival
          ggtheme = theme_bw(), # Change ggplot2 theme
          palette = c("#E7B800", "#2E9FDF"))

The log-rank test can be used to compare survival curves.

27.3 Exercises

PDF Exercise survival

PDF Exercise survival - Solution