Skip to content

Commit 2faebd8

Browse files
authored
Merge pull request #142 from epiforecasts:plan-updates
Add project overview and context for analysis
2 parents 4aea896 + bb17072 commit 2faebd8

7 files changed

Lines changed: 449 additions & 97 deletions

File tree

CLAUDE.md

Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
# Project Overview
2+
3+
This is an R-based research project analyzing COVID-19 forecast accuracy across European models. The study examines how model structure and geographic specificity influence forecast performance using data from the European COVID-19 Forecast Hub.
4+
5+
## Research Question
6+
7+
How do model structure (mechanistic vs statistical) and geographic specificity (single-location vs multi-location models) affect forecast accuracy after adjusting for predictive difficulty?
8+
9+
See `report/Research-narrative.md` for additional project context.
10+
11+
## Project Structure
12+
13+
### Core Analysis Scripts (R/)
14+
15+
- **process-score.R**: Computes forecast scores using the `scoringutils` package
16+
- Scores forecasts on both natural and log scales
17+
- Calculates weighted interval scores (WIS)
18+
- Outputs: `data/scores-raw-{case|death}.csv`
19+
20+
- **process-data.R**: Data preparation and integration
21+
- Combines scores with explanatory variables (model classification, variant phases, country targets)
22+
- Calls utility functions for metadata, variants, and location data
23+
24+
- **analysis-model.R**: Main statistical analysis using Generalized Additive Mixed Models (GAMM)
25+
- Models WIS adjusting for: trend, location, time, horizon, model-specific effects
26+
- Isolates impact of Method (model structure) and CountryTargets (geographic specificity)
27+
- Uses `mgcv`, `gammit`, and `gratia` packages
28+
- Outputs: `output/results.rds`
29+
30+
- **analysis-descriptive.R**: Descriptive statistics and summary tables
31+
- Bootstrap confidence intervals
32+
- Score distributions by model characteristics
33+
34+
- **plot-model-results.R**: Visualization of GAMM model effects
35+
- Adjusted vs unadjusted effects by model
36+
- Supports anonymized output for peer review
37+
38+
- **plot-model-flow.R**: Workflow and flowchart visualizations
39+
40+
### Utility Scripts (R/)
41+
42+
- **utils-data.R**: Functions for accessing forecasts, observations, and population data
43+
- **utils-metadata.R**: Model names, submissions, and metadata classification helpers
44+
- **utils-variants.R**: COVID-19 variant phase classification
45+
46+
### Data (data/)
47+
48+
- `covid19-forecast-hub-europe.parquet`: Raw forecast submissions
49+
- `observed-{case|death}.csv`: Observed incidence data
50+
- `model-classification.csv`: Model categorization by structure and specificity
51+
- `populations.csv`: Population data by location
52+
- `scores-raw-{case|death}.csv`: Computed forecast scores (generated)
53+
54+
### Manuscript text (prose and writing)
55+
56+
- `report/Revision_manuscript.md` — full manuscript text (title, abstract, background, methods, results, discussion). **Edit this file for any writing changes.**
57+
- `report/Research-narrative.md` — Overall narrative of the research, and paragraph-by-paragraph one-line summary of the manuscript text
58+
- `submission/reviewer-response-analysis.md` — tracks reviewer suggestions and planned response; X marks completion. Consult when making revision-related changes.
59+
60+
### Rendered analysis (code and outputs)
61+
62+
- `report/results.qmd` — active Quarto document; sources R scripts and renders figures/tables for the results section
63+
- `report/supplement/Supplement.Rmd` — supplementary materials; sources the same R scripts
64+
- `report/results.Rmd` — legacy RMarkdown copy of results (inactive; use `.qmd`)
65+
- Pre-print: [medRxiv 10.1101/2025.04.10.25325611](https://doi.org/10.1101/2025.04.10.25325611)
66+
67+
**Note**: manuscript prose and rendered analysis are separate. `Revision_manuscript.md` is not auto-generated — changes to analysis code and changes to Manuscript text must be coordinated manually.
68+
69+
## Reproducing the Analysis
70+
71+
### Setup Environment
72+
73+
```r
74+
# Install renv if needed
75+
install.packages("renv")
76+
77+
# Restore package environment
78+
renv::restore()
79+
```
80+
81+
### Run Analysis Pipeline
82+
83+
```r
84+
# 1. Score forecasts on natural and log scales
85+
source(here("R", "process-score.R"))
86+
87+
# 2. Prepare and integrate data
88+
source(here("R", "process-data.R"))
89+
90+
# 3. Fit GAMM to weighted interval scores
91+
source(here("R", "analysis-model.R"))
92+
93+
# 4. Generate reports
94+
# Render report/results.qmd
95+
# Knit report/supplement/Supplement.Rmd
96+
```
97+
98+
## Making Changes
99+
100+
| Task | Where to edit |
101+
|---|---|
102+
| Change manuscript prose (wording, framing, conclusions) | `report/Revision_manuscript.md` + update in `Research-narrative.md` |
103+
| Change analysis, model, or figures | Relevant `R/` script; outputs flow into `results.qmd` automatically |
104+
| Respond to a reviewer comment | Check `report/Revision_reviews-response.md`, update `R/` script if needed, then update `report/Revision_manuscript.md`, mark as completed in `report/Revision_reviews-response.md`, and close the relevant Github Issue with a note |
105+
| Add or change a supplementary figure | Relevant `R/` script + `report/supplement/Supplement.Rmd` |
106+
| All changes | Update `Plan.md` |
107+
108+
## Dependencies
109+
110+
Major R packages:
111+
- `mgcv` - Generalized Additive Models
112+
- `gammit` - GAMM utilities
113+
- `gratia` - GAM plotting
114+
- `scoringutils` - Forecast scoring
115+
- `arrow` - Parquet file handling
116+
- `tidyverse` ecosystem (dplyr, tidyr, ggplot2, readr, purrr)
117+
- `here` - Path management
118+
- `lubridate` - Date handling
119+
120+
## Publications
121+
122+
- **DOI**: [10.5281/zenodo.14903161](https://doi.org/10.5281/zenodo.14903161)
123+
- **Pre-print**: [10.1101/2025.04.10.25325611](https://doi.org/10.1101/2025.04.10.25325611)
124+
- **Slides**: [Google Slides](https://docs.google.com/presentation/d/1BSdTEuZ_zKdU8tBFuRMmP7GwHht1D0oZSkaFWovz9ao/edit?slide=id.p)

Plan.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
TODO
2+
3+
Codebase and manuscript structure
4+
- [ ] convert full manuscript into quarto
5+
- [ ] incorporate results.qmd so all main output is in a single document; either copy in results directly, or use quarto _includes if that compromises readability
6+
- [ ] Update CLAUDE.md to reflect structural changes
7+
8+
Code functionality
9+
- [ ] check results pipeline runs
10+
- [ ] update the figures
11+
- [ ] fix variant code and data issues
12+
13+
Reporting
14+
- [ ] create paragraph-by-paragraph one line summary in the `Research-narrative.md`
15+
- [ ] assess whether the flow of text matches with intended research narrative
16+
- [ ] compare the text against the STROBE reporting guidelines and flag issues: https://resources.equator-network.org/reporting-guidelines/strobe/#summary
17+
- [ ] Add mathematical notation for the model in the main text, with explanation
18+
- [ ] flag remaining reviewer comments and suggest actions
19+
- [ ] refactor inline referencing with zotero citation keys (https://www.zotero.org/kathsherratt/library)
20+
21+
Bonus features:
22+
- [ ] create a DAG to represent modelling approach
23+
- [ ] consider removing geographic specificity as an exposure
24+
- [ ] consider identifying the simplest possible version of such a model that can be used to demonstrate the approach, and then build up to the more complex version

report/.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
/.quarto/

report/Research-narrative.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
## Overall narrative
2+
3+
This gives a brief overview of the research problem and how this work intends to approach it.
4+
5+
For public health,
6+
7+
- Forecasting aims to be accurate to observed data: minimising error and maximising calibration.
8+
- There are many forecasters and forecasting methods
9+
- Identifying which methods are most accurate, when, and why can help prioritise model development/interpretation
10+
11+
However,
12+
13+
- Accuracy is difficult to compare when there is an unbalanced set of modellers forecasting across different targets with many possible confounding factors (different expected difficulty, or predictability, of each target, influencing both modelling approach and resulting score)
14+
- The issue of an unbalanced sample can be handled by restricting the sample to ensure consistency, or scoring with pairwise comparison
15+
- The issue of confounding is typically handled by stratifying scores
16+
- For example, age is almost universally considered as a confounding variable in epidemiological analysis and adjusted for
17+
- Analogously, we might include forecast horizon an inherently changing the system we are predicting
18+
19+
We suggest that,
20+
21+
- Both of these issues can be handled together by fitting a model to forecast scores
22+
- This explicitly identifies the analyst's view of relationships affecting forecast performance
23+
- In this work, we explore the effects of model structure, and geographic specificity, on forecast performance
24+
- We use a GAMM to account for a multi-model, multi-target forecast setting
25+
26+
Specifically, our analysis uses a Generalized Additive Mixed Model (GAMM):
27+
28+
- **Response**: Weighted Interval Score (WIS) on log scale
29+
- **Fixed effects of interest**:
30+
- Method (model structure: mechanistic, statistical, ensemble, etc.)
31+
- CountryTargets (geographic specificity: single vs multi-location)
32+
- **Adjustment factors**:
33+
- Trend in incidence (cases or deaths)
34+
- Dominant variant phase in each location (random effect)
35+
- Horizon by model (factor smooth interaction)
36+
- Location-specific (random effect)
37+
- Model-specific random effects
38+
39+
## Full manuscript summary
40+

0 commit comments

Comments
 (0)