surv-extrapolation/other-hybrid-methods.qmd at main · ICON-in-R/surv-extrapolation · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
---
title: "Other hybrid methods"
bibliography: references.bib
format:
  html:
    code-copy: true
editor_options:
  chunk_output_type: console
---

# Gelber method

## Background

A method similar to that developed by LRIG involved fitting an appropriate model to the tail of the Kaplan Meier curve and using the estimated model combined with the Kaplan Meier to produce an estimate of total mean survival. [@Gelber1993] state that this method is of particular use when it is easier to fit an appropriate parametric model to the tail of a Kaplan Meier rather than to the Kaplan Meier as a whole. In the method proposed by [@Gelber1993] log-cumulative hazard probability plots are used to determine the appropriate parametric model and to determine the appropriate values for the point at which the parametric curve takes over from the Kaplan Meier. Both the Gelber method and the LRIG Exponential method are likely to be sensitive to the point at which the parametric model takes over from the Kaplan Meier, $s_0$, and therefore if either of these methods are used it is important to provide clear rationale for the switch point using statistical analysis.

$$
\hat{S}(t) = \hat{S}_{KM}(s_0) S^*(t)/S^*(s_0) \mbox{ for } t > s_0
$$ This formulation from the original paper takes a combined (product) survival of the Kaplan-Meier and parametric curve after time $s_0$. In the following we will estimate a single extrapolated curve instead for simplicity.

## R Examples

We will use the `ovarian` data set from the `flexsurv` package. First, let us show how to get the KM and parametric fit using the total data set.

```{r warning=FALSE, message=FALSE}
library(flexsurv)
library(dplyr)

# survival data

# kaplan-meier before cut point
fitg <- flexsurvreg(formula = Surv(futime, fustat) ~ 1, data = ovarian, dist = "gompertz")

plot(fitg)
```

Now, let us pick an arbitrary cut point at time 100 and find the KM probability of survival at that time.

```{r}
fit_km <- survfit(formula = Surv(futime, fustat) ~ 1, data = ovarian)

t_cutpt <- 400
p_cutpt <- min(fit_km$surv[fit_km$time < t_cutpt])
```

We wish to fit the parametric curve to the data after the cut point so remove the data before this time.

```{r}
data <- mutate(ovarian, cutpt = futime < t_cutpt)
data <- split(data, f = data$cutpt)
```

and fit the curve, in this case a gompertz,

```{r}
fitcp <- flexsurvreg(formula = Surv(futime - t_cutpt, fustat) ~ 1, data = data$`FALSE`, dist = "gompertz")
```

It just remains to summarise the fitted curve and plot it with the KM.

```{r}
summ <- summary(fitcp, t = seq(0, 1200, 10))

S_cp <- summ[[1]]$est[summ[[1]]$time == 400]

plot(fit_km)
lines(x = summ[[1]][["time"]] + t_cutpt,
      y = summ[[1]][["est"]]*p_cutpt,
      type = "l", col = "red", lwd = 2)
```

Notice that `p_cutpt` is the KM survival probability at the cut point $\hat{S}_{KM}(s_0)$.

For convenience, we have written a package to do all of these steps called `gelber_hybrid_curve`.

```{r}
#| code-fold: true
#| code-summary: "gelber_hybrid_curve() function"

gelber_hybrid_curve <- function(formula =  Surv(futime, fustat) ~ 1,
                                data = ovarian,
                                dist = "weibull",
                                t_cutpt = 100, ...) {

  # kaplan-meier before cut point
  fit_km <- survfit(formula = formula, data = data)
  plot(fit_km, ...)
  xylims <- par("usr")

  data_cutpt <- fitcp <- summ <- list()

  for (i in seq_along(t_cutpt)) {

    # what is S_hat at cut point?
    p_cutpt[i] <- min(fit_km$surv[fit_km$time < t_cutpt[i]])

    ucl_cutpt <- min(fit_km$upper[fit_km$time < t_cutpt[i]])
    lcl_cutpt <- min(fit_km$lower[fit_km$time < t_cutpt[i]])

    u95_km <- ucl_cutpt[i] - p_cutpt
    l95_km <- p_cutpt[i] - lcl_cutpt

    data_cutpt[[i]] <-
      data |>
      filter(futime >= t_cutpt[i]) |>
      mutate(futime = futime - t_cutpt[i])

    # parametric curve after cut point
    fitcp[[i]] <- flexsurvreg(formula = formula,
                              data = data_cutpt[[i]],
                              dist = dist)

    summ[[i]] <- summary(fitcp[[i]], t = seq(0, xylims[2], 10))

    lines(x = summ[[i]][[1]][["time"]] + t_cutpt[i],
          y = summ[[i]][[1]][["est"]]*p_cutpt[i],
          type = "l", col = "red", lwd = 2)

    if (length(t_cutpt) == 1) {
      lines(x = summ[[i]][[1]][["time"]] + t_cutpt[i],
            y = pmax(0, summ[[i]][[1]][["lcl"]]*p_cutpt[i] - l95_km),
            type = "l", col = "red", lty = 2)

      lines(x = summ[[i]][[1]][["time"]] + t_cutpt[i],
            y = pmax(0, summ[[i]][[1]][["lcl"]]*p_cutpt[i]),
            type = "l", col = "red", lty = 3)

      lines(x = summ[[i]][[1]][["time"]] + t_cutpt[i],
            y = pmin(1, summ[[i]][[1]][["ucl"]]*p_cutpt[i] + u95_km),
            type = "l", col = "red", lty = 2)

      lines(x = summ[[i]][[1]][["time"]] + t_cutpt[i],
            y = pmin(1, summ[[i]][[1]][["ucl"]]*p_cutpt[i]),
            type = "l", col = "red", lty = 3)
    }
  }
}
```

This can be used to explore the effect of using different cut points and different distributions.

```{r}
gelber_hybrid_curve(t_cutpt = 400, dist = "exp")
gelber_hybrid_curve(t_cutpt = 400, dist = "lognormal")
gelber_hybrid_curve(t_cutpt = c(100, 400, 600), xlim = c(0, 2000))
gelber_hybrid_curve(t_cutpt = c(100, 400, 600), dist = "lognormal", xlim = c(0, 2000))
```