epiforecasts · kathsherratt · Feb 18, 2026 · Feb 18, 2026
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,124 @@
+# Project Overview
+
+This is an R-based research project analyzing COVID-19 forecast accuracy across European models. The study examines how model structure and geographic specificity influence forecast performance using data from the European COVID-19 Forecast Hub.
+
+## Research Question
+
+How do model structure (mechanistic vs statistical) and geographic specificity (single-location vs multi-location models) affect forecast accuracy after adjusting for predictive difficulty?
+
+See `report/Research-narrative.md` for additional project context.
+
+## Project Structure
+
+### Core Analysis Scripts (R/)
+
+- **process-score.R**: Computes forecast scores using the `scoringutils` package
+  - Scores forecasts on both natural and log scales
+  - Calculates weighted interval scores (WIS)
+  - Outputs: `data/scores-raw-{case|death}.csv`
+
+- **process-data.R**: Data preparation and integration
+  - Combines scores with explanatory variables (model classification, variant phases, country targets)
+  - Calls utility functions for metadata, variants, and location data
+
+- **analysis-model.R**: Main statistical analysis using Generalized Additive Mixed Models (GAMM)
+  - Models WIS adjusting for: trend, location, time, horizon, model-specific effects
+  - Isolates impact of Method (model structure) and CountryTargets (geographic specificity)
+  - Uses `mgcv`, `gammit`, and `gratia` packages
+  - Outputs: `output/results.rds`
+
+- **analysis-descriptive.R**: Descriptive statistics and summary tables
+  - Bootstrap confidence intervals
+  - Score distributions by model characteristics
+
+- **plot-model-results.R**: Visualization of GAMM model effects
+  - Adjusted vs unadjusted effects by model
+  - Supports anonymized output for peer review
+
+- **plot-model-flow.R**: Workflow and flowchart visualizations
+
+### Utility Scripts (R/)
+
+- **utils-data.R**: Functions for accessing forecasts, observations, and population data
+- **utils-metadata.R**: Model names, submissions, and metadata classification helpers
+- **utils-variants.R**: COVID-19 variant phase classification
+
+### Data (data/)
+
+- `covid19-forecast-hub-europe.parquet`: Raw forecast submissions
+- `observed-{case|death}.csv`: Observed incidence data
+- `model-classification.csv`: Model categorization by structure and specificity
+- `populations.csv`: Population data by location
+- `scores-raw-{case|death}.csv`: Computed forecast scores (generated)
+
+### Manuscript text (prose and writing)
+
+- `report/Revision_manuscript.md` — full manuscript text (title, abstract, background, methods, results, discussion). **Edit this file for any writing changes.**
+- `report/Research-narrative.md` — Overall narrative of the research, and paragraph-by-paragraph one-line summary of the manuscript text
+- `submission/reviewer-response-analysis.md` — tracks reviewer suggestions and planned response; X marks completion. Consult when making revision-related changes.
+
+### Rendered analysis (code and outputs)
+
+- `report/results.qmd` — active Quarto document; sources R scripts and renders figures/tables for the results section
+- `report/supplement/Supplement.Rmd` — supplementary materials; sources the same R scripts
+- `report/results.Rmd` — legacy RMarkdown copy of results (inactive; use `.qmd`)
+- Pre-print: [medRxiv 10.1101/2025.04.10.25325611](https://doi.org/10.1101/2025.04.10.25325611)
+
+**Note**: manuscript prose and rendered analysis are separate. `Revision_manuscript.md` is not auto-generated — changes to analysis code and changes to Manuscript text must be coordinated manually.
+
+## Reproducing the Analysis
+
+### Setup Environment
+
+```r
+# Install renv if needed
+install.packages("renv")
+
+# Restore package environment
+renv::restore()
+```
+
+### Run Analysis Pipeline
+
+```r
+# 1. Score forecasts on natural and log scales
+source(here("R", "process-score.R"))
+
+# 2. Prepare and integrate data
+source(here("R", "process-data.R"))
+
+# 3. Fit GAMM to weighted interval scores
+source(here("R", "analysis-model.R"))
+
+# 4. Generate reports
+# Render report/results.qmd
+# Knit report/supplement/Supplement.Rmd
+```
+
+## Making Changes
+
+| Task | Where to edit |
+|---|---|
+| Change manuscript prose (wording, framing, conclusions) | `report/Revision_manuscript.md` + update in `Research-narrative.md` |
+| Change analysis, model, or figures | Relevant `R/` script; outputs flow into `results.qmd` automatically |
+| Respond to a reviewer comment | Check `report/Revision_reviews-response.md`, update `R/` script if needed, then update `report/Revision_manuscript.md`, mark as completed in `report/Revision_reviews-response.md`, and close the relevant Github Issue with a note |
+| Add or change a supplementary figure | Relevant `R/` script + `report/supplement/Supplement.Rmd` |
+| All changes | Update `Plan.md` |
+
+## Dependencies
+
+Major R packages:
+- `mgcv` - Generalized Additive Models
+- `gammit` - GAMM utilities
+- `gratia` - GAM plotting
+- `scoringutils` - Forecast scoring
+- `arrow` - Parquet file handling
+- `tidyverse` ecosystem (dplyr, tidyr, ggplot2, readr, purrr)
+- `here` - Path management
+- `lubridate` - Date handling
+
+## Publications
+
+- **DOI**: [10.5281/zenodo.14903161](https://doi.org/10.5281/zenodo.14903161)
+- **Pre-print**: [10.1101/2025.04.10.25325611](https://doi.org/10.1101/2025.04.10.25325611)
+- **Slides**: [Google Slides](https://docs.google.com/presentation/d/1BSdTEuZ_zKdU8tBFuRMmP7GwHht1D0oZSkaFWovz9ao/edit?slide=id.p)
diff --git a/Plan.md b/Plan.md
@@ -0,0 +1,24 @@
+TODO
+
+Codebase and manuscript structure
+- [ ] convert full manuscript into quarto
+    - [ ] incorporate results.qmd so all main output is in a single document; either copy in results directly, or use quarto _includes if that compromises readability
+    - [ ] Update CLAUDE.md to reflect structural changes
+
+Code functionality
+- [ ] check results pipeline runs
+- [ ] update the figures
+- [ ] fix variant code and data issues
+
+Reporting 
+- [ ] create paragraph-by-paragraph one line summary in the `Research-narrative.md`
+    - [ ] assess whether the flow of text matches with intended research narrative
+    - [ ] compare the text against the STROBE reporting guidelines and flag issues: https://resources.equator-network.org/reporting-guidelines/strobe/#summary 
+- [ ] Add mathematical notation for the model in the main text, with explanation
+- [ ] flag remaining reviewer comments and suggest actions
+- [ ] refactor inline referencing with zotero citation keys (https://www.zotero.org/kathsherratt/library)
+
+Bonus features:
+- [ ] create a DAG to represent modelling approach
+- [ ] consider removing geographic specificity as an exposure
+- [ ] consider identifying the simplest possible version of such a model that can be used to demonstrate the approach, and then build up to the more complex version
diff --git a/report/.gitignore b/report/.gitignore
@@ -0,0 +1 @@
+/.quarto/
diff --git a/report/Research-narrative.md b/report/Research-narrative.md
@@ -0,0 +1,40 @@
+## Overall narrative
+
+This gives a brief overview of the research problem and how this work intends to approach it. 
+
+For public health,
+
+- Forecasting aims to be accurate to observed data: minimising error and maximising calibration.
+- There are many forecasters and forecasting methods 
+- Identifying which methods are most accurate, when, and why can help prioritise model development/interpretation
+
+However,
+
+- Accuracy is difficult to compare when there is an unbalanced set of modellers forecasting across different targets with many possible confounding factors (different expected difficulty, or predictability, of each target, influencing both modelling approach and resulting score)
+- The issue of an unbalanced sample can be handled by restricting the sample to ensure consistency, or scoring with pairwise comparison
+- The issue of confounding is typically handled by stratifying scores
+   - For example, age is almost universally considered as a confounding variable in epidemiological analysis and adjusted for
+   - Analogously, we might include forecast horizon an inherently changing the system we are predicting
+
+We suggest that,
+
+- Both of these issues can be handled together by fitting a model to forecast scores
+- This explicitly identifies the analyst's view of relationships affecting forecast performance
+- In this work, we explore the effects of model structure, and geographic specificity, on forecast performance
+- We use a GAMM to account for a multi-model, multi-target forecast setting
+
+Specifically, our analysis uses a Generalized Additive Mixed Model (GAMM):
+
+- **Response**: Weighted Interval Score (WIS) on log scale
+- **Fixed effects of interest**:
+  - Method (model structure: mechanistic, statistical, ensemble, etc.)
+  - CountryTargets (geographic specificity: single vs multi-location)
+- **Adjustment factors**:
+  - Trend in incidence (cases or deaths)
+  - Dominant variant phase in each location (random effect)
+  - Horizon by model (factor smooth interaction)
+  - Location-specific (random effect)
+  - Model-specific random effects
+
+## Full manuscript summary
+