Skip to content

Comments

Fix variant phase assignment with per-location approach#144

Merged
kathsherratt merged 4 commits intomainfrom
pr2-fix-variant-code
Feb 19, 2026
Merged

Fix variant phase assignment with per-location approach#144
kathsherratt merged 4 commits intomainfrom
pr2-fix-variant-code

Conversation

@kathsherratt
Copy link
Contributor

Summary

  • Replace global median-date hack in variant phase classification with correct per-location approach
  • Aggregate variant data across sources (ECDC TESSy/GISAID) before dominance assessment, fixing spurious results from small-sample sources
  • Require >50% variant share and enforce chronological ordering to guard against noisy surveillance data
  • Fix Sys.Date() to study end date 2023-03-17 for reproducibility
  • Regenerate GAMM results (output/results.rds) and diagnostic plots

Key results change

With per-location variant phases providing better covariate adjustment, Method and CountryTargets effects remain near-zero — consistent with, and slightly strengthening, the original finding of no systematic performance difference between model structures.

Known limitation

Hungary (HU) starts with Delta phase instead of Alpha due to sparse early variant surveillance data (no weeks with Alpha >50% in ECDC). This is acceptable given the data resolution.

Test plan

  • classify_variant_phases() runs without errors
  • All 32 locations have variant phase assignments with no NAs
  • Omicron arrival dates differ by location (GB earliest Dec 11, SK/BG latest Jan 15)
  • GAMM fits successfully for both cases and deaths
  • quarto render report/results.qmd produces complete HTML

🤖 Generated with Claude Code

kathsherratt and others added 3 commits February 19, 2026 12:15
… guards

Replace the global median-date hack for variant phase classification
with the correct per-location approach, plus three data quality fixes:

1. Aggregate variant percentages across data sources (ECDC TESSy/GISAID)
   before finding dominant variant, avoiding spurious dominance from
   small-sample sources (fixes PT anomalous Omicron in March 2021)

2. Require >50% variant share before marking a phase transition,
   filtering out noise from sparse surveillance weeks

3. Enforce chronological ordering of variant phases (Alpha → Delta →
   Omicron-BA.1 → ...) to prevent out-of-sequence assignments

Also:
- Fix Sys.Date() → as.Date("2023-03-17") for reproducibility
- Fix many-to-many join warning in Swiss hospital variant data
- Add library(ggplot2) to analysis-model.R for diagnostic plots
- Include PR 1 fixes in results.qmd (format: html, epi_target rename)
- Regenerate results.rds and diagnostic PDFs with updated variant phases

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update manuscript methods to describe per-location variant phase
assignment with data sources and 50% threshold. Add detailed
methods description to Supplement after the variant heatmap.
Re-render Supplement PDF to reflect per-location variant phases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two bugs caused incorrect variant phase assignments:

1. CH deduplication (get_variants_ch): The dedup filter discarded
   26 weeks of hospital data (BA.2+) for weeks where wastewater
   data also existed. Fixed by keeping both sources with distinct
   labels and letting the aggregation step handle overlap.

2. mean() vs sum() (set_variant_phases): Sub-variants mapped to the
   same phase (e.g., BA.4 + BA.5 → Omicron-BA.4/5) were averaged
   instead of summed, so BA.4/5 never exceeded the 50% threshold
   for GB. Fixed with two-step aggregation: sum within each source,
   then average across sources.

Result: All 32 locations now show correct variant phase sequences
including BA.4/5. CH goes from 2 phases to 6; GB gains BA.4/5.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Hungary's sparse ECDC surveillance data caused Delta to backfill to the
start of the timeseries, missing the Alpha phase entirely. Added manual
override setting Alpha from study start and Delta from the week following
23 July 2021, based on epidemiological reporting.

Source: https://abouthungary.hu/news-in-brief/delta-and-gamma-variant-identified-in-hungary

Re-ran GAMM and re-rendered Supplement with updated heatmap and methods text.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@kathsherratt
Copy link
Contributor Author

Prompts used to generate this PR

This PR was generated and iteratively refined using Claude Code (claude-opus-4-6). Below are the user prompts that drove the work, in chronological order.

Session 1 — Planning

each top-level bullet point in Plan.md indicates a discrete pull request. plan your approach to implementing these changes, for now ignoring the bonus features section.

In PR2, we will need to check data sources for variant phases in Switzerland and the UK, and ensure they are processed consistently. include this in the plan.

(During planning, user was asked about variant fix approach and chose "Switch to per-location (Recommended)")

Session 1 — PR 2 implementation

The initial per-location variant phase fix was implemented as planned: replacing the median-date hack with per-location dominant variant phases based on when each variant first exceeded 50% of sequenced samples. Commit: 2f6da1c.

Session 1 — Follow-up: methods text and Supplement

Before continuing the current plan, let's continue to improve PR#144. First, re-render the Supplement so that it shows the changes to variant phases, and save as a PDF. Then, suggest text to add to the methods section of the main manuscript to reflect how variant phases were calculated. If this needs more than 2-3 sentences to communicate, instead summarise in 1 sentence and add the detail to the Supplement (with a note in the main manuscript). When this is complete, stop and confirm before proceeding with the current plan.

yep

(Approved the manuscript text change)

Session 1 — Debugging CH/GB variant data

re variant phases - something wrong with swiss data processing; currently not showing variant phases for ba.2 onwards. check what this is and if it's relevant to gb data which also looks off.

(When asked whether to add to PR #144 or create a new PR, user chose "Add to PR #144". Claude investigated and identified two bugs: CH deduplication discarding hospital data for BA.2+ variants, and mean() instead of sum() preventing GB from detecting BA.4/5 dominance. User approved the fix plan.)

Session 2 — Rebase after PR #143 merge

PR 143 has been merged - check you are up to date with main.

(Approved force-push after clean rebase)

Session 2 — Hungary variant data fix

let's fix the issue with variant data in Hungary. The current data are missing (Hungary did not report variant data until later), which means the backfill method gives the Delta phase from the start of the timeseries.

I've identified that Delta started spreading from 23 July 2021 from this source: https://abouthungary.hu/news-in-brief/delta-and-gamma-variant-identified-in-hungary

Update the code so that Hungary is in the Alpha variant phase until Delta starts in the week following 23 July 2021. Add a comment in the code referencing the source. Then update the manuscript/supplement with appropriate text.

@kathsherratt
Copy link
Contributor Author

kathsherratt commented Feb 19, 2026

Noting that this was initially prompted by following Plan.md. And code manually reviewed at each stage as per prompt history above.

@kathsherratt kathsherratt merged commit 3875105 into main Feb 19, 2026
1 check failed
@kathsherratt kathsherratt deleted the pr2-fix-variant-code branch February 19, 2026 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant