|
| 1 | +# Project Overview |
| 2 | + |
| 3 | +This is an R-based research project analyzing COVID-19 forecast accuracy across European models. The study examines how model structure and geographic specificity influence forecast performance using data from the European COVID-19 Forecast Hub. |
| 4 | + |
| 5 | +## Research Question |
| 6 | + |
| 7 | +How do model structure (mechanistic vs statistical) and geographic specificity (single-location vs multi-location models) affect forecast accuracy after adjusting for predictive difficulty? |
| 8 | + |
| 9 | +See `report/Research-narrative.md` for additional project context. |
| 10 | + |
| 11 | +## Project Structure |
| 12 | + |
| 13 | +### Core Analysis Scripts (R/) |
| 14 | + |
| 15 | +- **process-score.R**: Computes forecast scores using the `scoringutils` package |
| 16 | + - Scores forecasts on both natural and log scales |
| 17 | + - Calculates weighted interval scores (WIS) |
| 18 | + - Outputs: `data/scores-raw-{case|death}.csv` |
| 19 | + |
| 20 | +- **process-data.R**: Data preparation and integration |
| 21 | + - Combines scores with explanatory variables (model classification, variant phases, country targets) |
| 22 | + - Calls utility functions for metadata, variants, and location data |
| 23 | + |
| 24 | +- **analysis-model.R**: Main statistical analysis using Generalized Additive Mixed Models (GAMM) |
| 25 | + - Models WIS adjusting for: trend, location, time, horizon, model-specific effects |
| 26 | + - Isolates impact of Method (model structure) and CountryTargets (geographic specificity) |
| 27 | + - Uses `mgcv`, `gammit`, and `gratia` packages |
| 28 | + - Outputs: `output/results.rds` |
| 29 | + |
| 30 | +- **analysis-descriptive.R**: Descriptive statistics and summary tables |
| 31 | + - Bootstrap confidence intervals |
| 32 | + - Score distributions by model characteristics |
| 33 | + |
| 34 | +- **plot-model-results.R**: Visualization of GAMM model effects |
| 35 | + - Adjusted vs unadjusted effects by model |
| 36 | + - Supports anonymized output for peer review |
| 37 | + |
| 38 | +- **plot-model-flow.R**: Workflow and flowchart visualizations |
| 39 | + |
| 40 | +### Utility Scripts (R/) |
| 41 | + |
| 42 | +- **utils-data.R**: Functions for accessing forecasts, observations, and population data |
| 43 | +- **utils-metadata.R**: Model names, submissions, and metadata classification helpers |
| 44 | +- **utils-variants.R**: COVID-19 variant phase classification |
| 45 | + |
| 46 | +### Data (data/) |
| 47 | + |
| 48 | +- `covid19-forecast-hub-europe.parquet`: Raw forecast submissions |
| 49 | +- `observed-{case|death}.csv`: Observed incidence data |
| 50 | +- `model-classification.csv`: Model categorization by structure and specificity |
| 51 | +- `populations.csv`: Population data by location |
| 52 | +- `scores-raw-{case|death}.csv`: Computed forecast scores (generated) |
| 53 | + |
| 54 | +### Manuscript text (prose and writing) |
| 55 | + |
| 56 | +- `report/Revision_manuscript.md` — full manuscript text (title, abstract, background, methods, results, discussion). **Edit this file for any writing changes.** |
| 57 | +- `report/Research-narrative.md` — Overall narrative of the research, and paragraph-by-paragraph one-line summary of the manuscript text |
| 58 | +- `submission/reviewer-response-analysis.md` — tracks reviewer suggestions and planned response; X marks completion. Consult when making revision-related changes. |
| 59 | + |
| 60 | +### Rendered analysis (code and outputs) |
| 61 | + |
| 62 | +- `report/results.qmd` — active Quarto document; sources R scripts and renders figures/tables for the results section |
| 63 | +- `report/supplement/Supplement.Rmd` — supplementary materials; sources the same R scripts |
| 64 | +- `report/results.Rmd` — legacy RMarkdown copy of results (inactive; use `.qmd`) |
| 65 | +- Pre-print: [medRxiv 10.1101/2025.04.10.25325611](https://doi.org/10.1101/2025.04.10.25325611) |
| 66 | + |
| 67 | +**Note**: manuscript prose and rendered analysis are separate. `Revision_manuscript.md` is not auto-generated — changes to analysis code and changes to Manuscript text must be coordinated manually. |
| 68 | + |
| 69 | +## Reproducing the Analysis |
| 70 | + |
| 71 | +### Setup Environment |
| 72 | + |
| 73 | +```r |
| 74 | +# Install renv if needed |
| 75 | +install.packages("renv") |
| 76 | + |
| 77 | +# Restore package environment |
| 78 | +renv::restore() |
| 79 | +``` |
| 80 | + |
| 81 | +### Run Analysis Pipeline |
| 82 | + |
| 83 | +```r |
| 84 | +# 1. Score forecasts on natural and log scales |
| 85 | +source(here("R", "process-score.R")) |
| 86 | + |
| 87 | +# 2. Prepare and integrate data |
| 88 | +source(here("R", "process-data.R")) |
| 89 | + |
| 90 | +# 3. Fit GAMM to weighted interval scores |
| 91 | +source(here("R", "analysis-model.R")) |
| 92 | + |
| 93 | +# 4. Generate reports |
| 94 | +# Render report/results.qmd |
| 95 | +# Knit report/supplement/Supplement.Rmd |
| 96 | +``` |
| 97 | + |
| 98 | +## Making Changes |
| 99 | + |
| 100 | +| Task | Where to edit | |
| 101 | +|---|---| |
| 102 | +| Change manuscript prose (wording, framing, conclusions) | `report/Revision_manuscript.md` + update in `Research-narrative.md` | |
| 103 | +| Change analysis, model, or figures | Relevant `R/` script; outputs flow into `results.qmd` automatically | |
| 104 | +| Respond to a reviewer comment | Check `report/Revision_reviews-response.md`, update `R/` script if needed, then update `report/Revision_manuscript.md`, mark as completed in `report/Revision_reviews-response.md`, and close the relevant Github Issue with a note | |
| 105 | +| Add or change a supplementary figure | Relevant `R/` script + `report/supplement/Supplement.Rmd` | |
| 106 | +| All changes | Update `Plan.md` | |
| 107 | + |
| 108 | +## Dependencies |
| 109 | + |
| 110 | +Major R packages: |
| 111 | +- `mgcv` - Generalized Additive Models |
| 112 | +- `gammit` - GAMM utilities |
| 113 | +- `gratia` - GAM plotting |
| 114 | +- `scoringutils` - Forecast scoring |
| 115 | +- `arrow` - Parquet file handling |
| 116 | +- `tidyverse` ecosystem (dplyr, tidyr, ggplot2, readr, purrr) |
| 117 | +- `here` - Path management |
| 118 | +- `lubridate` - Date handling |
| 119 | + |
| 120 | +## Publications |
| 121 | + |
| 122 | +- **DOI**: [10.5281/zenodo.14903161](https://doi.org/10.5281/zenodo.14903161) |
| 123 | +- **Pre-print**: [10.1101/2025.04.10.25325611](https://doi.org/10.1101/2025.04.10.25325611) |
| 124 | +- **Slides**: [Google Slides](https://docs.google.com/presentation/d/1BSdTEuZ_zKdU8tBFuRMmP7GwHht1D0oZSkaFWovz9ao/edit?slide=id.p) |
0 commit comments