simba/baseline.qmd at main · dd-harp/simba · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
---
title: "Counterfactual Baseline"
subtitle: "A Changing Baseline Modified by Vector Control"
editor:
  mode: source
bibliography: data/refs.bib
csl: data/plos.csl
format: html
---

------------------------------------------------------------------------

*Given a time series describing the parasite rate in a population, we fit a model to a PR time series to understand malaria as a changing baseline that has been modified by vector control.*

------------------------------------------------------------------------

Suppose we start with a time series that we want to analyze. We have two bits of data: the imputed *Pf*PR over some time period, and the timing of vector control. We start out with a model that has been fitted to data (see [The History of Malaria](history.qmd)). At this point, the model was fit without using any information about the history of vector control. What we want to do now is to construct a model that reconstructs this time series as a **counterfactual baseline** with that includes interventions with estimated effect sizes.

```{r}
library(ramp.xds)
library(ramp.uganda)
library(ramp.control)
library(ramp.work)
```

```{r}
#devtools::load_all("~/git/ramp.control")
#devtools::load_all("~/git/ramp.work")
```

Consider the following example. The green vertical lines are mass distributions of ITNs, the type listed above (PBO or Royal Guard); the blue vertical lines are IRS, the insecticide tagged above (Fludora Fusion or Actellic). We note that the first ITN distribution, with PBO nets called `PBO 0` was about a year before our time series starts. The next two ITN distributions, called `PBO 1` and `PBO 2` occurred on days 776 and 2175, respectively. Another ITN distribution, with Royal Guard nets, `RG` occurred on day 3240. This district also got IRS on days 2861, 3256, and 3622: the first one, `Flu Fu` was done with Fludora Fusion; the next two, called `Act 1` and `Act 2` used Actellic. This is a picture of what happened: the data in black with a fitted model in red:

```{r, echo=F, eval=F}
dname = "Terego District"
prts <- get_district_pfpr(dname)
with(prts, {
  plot(jdate, pfpr, xlim = range(jdate, -250), ylim = c(0,0.7), main = "Example District", xlab = "Julian Date (day 1 is Jan 1, 2015)", ylab = "PR")
  lines(jdate, pfpr)

  irs <- get_irs_jdates(dname, 2015)
  irs_type = c("Flu Fu", "Act 1", "Act 2")
  for(i in 1:length(irs)){
    segments(irs[i], 0, irs[i], .55, col = "darkblue")
    text(irs[i], 0.6, irs_type[i], col = "darkblue")
  }

  itn <- sort(get_itn_jdates(dname, 2015))
  itn_type= c("PBO 0", "PBO 1", "PBO 2", "RG")
  for(i in 1:length(itn)){
    segments(itn[i], 0, itn[i], .55, col = "darkgreen")
    text(itn[i], 0.65, itn_type[i], col = "darkgreen")
  }
})

fit_mod <- readRDS("fit_mod.rds")
xds_plot_PR(fit_mod, add=TRUE, clrs = "darkred")
```

The analysis we present is largely *descriptive.* These are all the data we have to describe malaria in a population of around 200,000 people. Our goal is to do the best we can with the data we've got so that we can develop sensible policy advice. Constructive criticism is welcome, but we've got a round file for other kinds of comments.

# A Counterfactual Baseline

To reconstruct the counterfactual baseline, we will go through the list of interventions one at a time, and we need to make a determination of the effect size of the intervention, contingent on what would have happened otherwise.

The core challenge is that we need to estimate **two** things from *one* time series, so there is some intrinsic *uncertainty.*

The time series that was observed *could* have happened in many ways. We thus need an algorithm capable of propagating that uncertainty. The algorithm works like this:

-   Build a model and fit it to the data.

-   For the first intervention, reconstruct history as a baseline, modified by control:

    -   Impute a baseline

    -   Contingent on the imputed baseline, estimate the effect size

-   For the next intervation, estimate the effect size, contingent on the previous

To show how this works, we will walk through the time series for the district above.

Our goal here is to walk through the analysis to draw attention to the issues.

## PBO 1

The first ITN distribution, `PBO 0` occurred 351 days before our first data point, so it is possible that what we're seeing at the beginning is a result of that distribution. The second ITN distribution, the one we want to assess, occurred on day 776.

In this model, the trend is configured as a spline that modifies a signal with a mean and a seasonal pattern. The spline knots and nodes are:

```{r, eval=F}
rbind(fit_mod$Lpar[[1]]$trend_par$tt[2:5],
round(1000*fit_mod$Lpar[[1]]$trend_par$yy[2:5])/1000)
```

That third node is the one we want to consider replacing. The mean value of the two that precede it are:

```{r, eval=F}
mean(fit_mod$Lpar[[1]]$trend_par$yy[c(2,3)])
```

The fitted value we would want to replace is:

```{r, eval=F}
fit_mod$Lpar[[1]]$trend_par$yy[4]
```

Based on this, we conclude it is pointless to try and fit an effect size to an intervention model. The model fitting would surely return *no effect.* For this reconstruction, we thus conclude that `PBO 1` did not have any effect, and we move on.

## PBO 2

The next ITN distribution, called `PBO 1` is on day 2,175.

Now, our task of reconstructing a baseline is more complicated. The fitted values from the first 6 years of data, and it looks like we've got a trend:

```{r, eval=F}
knots <- fit_mod$Lpar[[1]]$trend_par$yy[2:7]
tm <- c(1:6)
plot(tm, knots)
llm <- lm(knots~tm)
abline(llm)
```

Malaria prevalence has increased by about 25% over six years of observation. Using a linear model, the trend is already statistically significant.

```{r, eval=F}
summary(llm)
```

With only six years, it's hard to have a lot of confidence about what we would expect to happen next.

In fact, the upward trend does not continue, but the inflection point coincides with `PBO 2`. What happened? Did the trend stop on its own, or did `PBO 2` reverse the upwards trend?

At this point, after the nets go out, the upward trend stops. Was the trend real?

We don't learn a ton more with hindsight, because what happens next is also modified by vector control: a little less than two years later, the district implements IRS with Fludora Fusion. a year after that, after mass distributing Royal Guard nets and IRS with Actellic, malaria prevalence falls to around 20% of its long-term average. Here, we contrast two baseline scenarios: the projected linear trend (red); and regression to the mean (blue). These are compared to the fitted model (black). The solid vertical line segment marks the day of the intervention, and the dashed vertical line is one year later.

```{r, eval=F}
impute_baseline_linear = function(ix, model){
  yy <- head(model$Lpar[[1]]$trend_par$yy, ix-1)
  tt <- head(model$Lpar[[1]]$trend_par$tt, ix-1)
  yyv <- yy[which(tt>0)]
  llm <- lm(yyv ~ c(1:length(yyv)))
  c(yyv, tail(yyv, 1) + llm$coef[2])
}
plot(impute_baseline_linear(8, fit_mod))
```

```{r, eval=F}
projected <- .802930 + .0372*c(7:9)
projected
```

```{r, eval=F}
fit_mod1 <- fit_mod
fit_mod1$Lpar[[1]]$trend_par$yy[8:10] <- projected
fit_mod1$Lpar[[1]]$F_trend <-
      make_function(fit_mod1$Lpar[[1]]$trend_par)
fit_mod1 <- xds_solve(fit_mod1, 2550, 10)

fit_mod2 <- fit_mod
fit_mod2$Lpar[[1]]$trend_par$yy[8:10] <-
  mean(fit_mod$Lpar[[1]]$trend_par$yy[2:7])
fit_mod2$Lpar[[1]]$F_trend <-
  make_function(fit_mod2$Lpar[[1]]$trend_par)
fit_mod2 <- xds_solve(fit_mod2, 2550, 10)

xds_plot_PR(fit_mod1, clrs = "darkred")
xds_plot_PR(fit_mod2, clrs = "darkblue", add=TRUE)
xds_plot_PR(fit_mod, add=TRUE)
segments(2175, 0, 2175, 1)
segments(2540, 0, 2540, 1, lty = 2)
```

If we used the linear model value for the baseline, the PBO net distribution would need to explain a 20% reduction in *Pf*PR (darkred). If we assume regression to the mean from the previous six years (darkblue), the reduction was about 6%; the observed value in the seventh year is close to the average value observed in the previous six.

```{r, eval=F}
fit_mod1$Lpar[[1]]$trend_par$yy[8]/
  fit_mod$Lpar[[1]]$trend_par$yy[8]
mean(fit_mod$Lpar[[1]]$trend_par$yy[2:7])/
  fit_mod$Lpar[[1]]$trend_par$yy[8]
```

We need to make a decision, now, about how to handle the effect size of `PBO 2` on malaria prevalence. This history already assumes that `PBO 1` had no effect, and a 6% change in *Pf*PR is not outside the normal variability. A reasonable conclusion is thus that it had no effect.

## Flu Fu

686 days after `PBO 2`, on day 2,861, the district got IRS with Fludora Fusion. Once again, we create a baseline by setting the values from the previous 7 years, we simulate forward one year and we get this picture (blue is the putative baseline).

```{r, eval=F}
fit_mod3 <- fit_mod
fit_mod3$Lpar[[1]]$trend_par$yy[9:10] <-
  mean(fit_mod$Lpar[[1]]$trend_par$yy[2:8])
fit_mod3$Lpar[[1]]$F_trend <-
  make_function(fit_mod3$Lpar[[1]]$trend_par)
fit_mod3 <- xds_solve(fit_mod3, 3230, 10)
xds_plot_PR(fit_mod3, clrs = "darkblue")
xds_plot_PR(fit_mod, clrs = grey(0.9), add=TRUE)
segments(2861, 0, 2861, 1)
segments(3226, 0, 3226, 1)
```

## ACT 1 + RG

In the last step, In modeling the effect sizes of the interfNow, we have a different problem. We have a massive change in malaria following the implementation of two kinds of vector control, but we are not certain about which one did the work.

## Act 2

While there are no data, we can make a forecast of the effects of the actellic IRS round, `Act 2`, that could help us to resolve some of our open questions raised by the concurrence of `RG` and `Act 1.` With these data, at least, we will not be able to evaluate `Act 2`. Maybe next year.

# Algorithm

We want to write an algorithm that implements, in some reasonable ways, all the steps described above:

## Pseudocode

**INPUTS**

-   We pass a *Pf*PR time series;

-   We pass a list of the vector control interventions:

    -   IRS rounds by type, timing, and coverage;

    -   Bed Net rounds by type, timing, and coverage;

**ALGORITHM**

-   We set up a base model and fit it to the time series, a *history* of malaria;

-   We reconstruct the history of malaria as a changing baseline modified by control;

**OUTPUTS**

We return two models:

-   A *counterfatual baseline* model. This is the counterfactual baseline -- what would have happened with no interventions;

-   A full model, including the *counterfactual baseline* plus a configured set of interventions with effect sizes. The full model should fit the data about as well as the *history* model.

------------------------------------------------------------------------

The algorithm works like this:

1.  Fit a model to the data: the *history* model.

2.  Set up a model with multiple rounds of control, setting coverage to 0.

3.  For every intervention, in order modify the *history* model:

    -   Replace the knot that follows the mass distribution

    -   Set coverage to the observed value

    -   Fit the contact parameter

4.  Return the two models:

    -   the *baseline, modified by control* model, as fitted in 3

    -   the *baseline* with *no* interventions.

## Flexibility and Constraints

The algorithm works like the process we described, but we should acknowledge two concerns:

-   There are usually several reasonable ways to reconstruct the baseline. The replacement knot value is:

    -   the mean of the previous $n$ knots

    -   the median of the previous $n$ knots

    -   a linear projection of the previous $n$ knots

    -   subsampled from the previous $n$ knots

    -   drawn from a distribution fitted to the previous $n$ knots

    -   done in some other reasonable way.

-   When two or more modes of vector control are implemented at the same time, we can not estimate the effect sizes for the individual interventions, so we estimate a *joint* effect size.