From a476b45e0365564a28cee876f27ac18195319cac Mon Sep 17 00:00:00 2001
From: Vahid Ahmadi <va.vahidahmadi@gmail.com>
Date: Wed, 27 May 2026 12:12:19 +0200
Subject: [PATCH] Add effective take-up rate analysis plan (#1354)

#1354 asks for a comprehensive analysis of how the data processing
pipeline changes effective take-up rates away from the seed values.
The initial analysis (Max's gist) covers UC and CTC; the issue asks
to extend that to every benefit.

Persist the methodology in-repo so the analysis is repeatable and the
per-programme target list is stable.

The page enumerates 17 programmes with current take-up handling status
(modern input-only pattern for UC/PC; legacy formula-derived for
HB/IS/WTC/CTC pending the #1621 fix; reported-only for disability
benefits; no take-up step at all for State Pension and CTR), the
methodology (seed rate -> effective post-pipeline rate -> published
outturn share -> drift), and proposes a docs/book/validation/take-up-rates.ipynb
deliverable.

Cross-links to the pipeline-alignment plan because the
assign_takeup_with_reported_anchors port from policyengine-us-data is
the most likely systematic source of seed-vs-effective drift, and
landing it changes the numbers this analysis would produce.
---
 changelog.d/1354.md                           |   1 +
 docs/book/assumptions/takeup-analysis-plan.md | 119 ++++++++++++++++++
 2 files changed, 120 insertions(+)
 create mode 100644 changelog.d/1354.md
 create mode 100644 docs/book/assumptions/takeup-analysis-plan.md

diff --git a/changelog.d/1354.md b/changelog.d/1354.md
new file mode 100644
index 000000000..c74813275
--- /dev/null
+++ b/changelog.d/1354.md
@@ -0,0 +1 @@
+- Add an effective take-up rate analysis plan at `docs/book/assumptions/takeup-analysis-plan.md` cataloguing the per-programme take-up handling status (input-only vs legacy formula-derived), the methodology for measuring seed-vs-effective drift after reweighting and SPI integration, and the relationship to the wider pipeline-alignment work in #1621.
diff --git a/docs/book/assumptions/takeup-analysis-plan.md b/docs/book/assumptions/takeup-analysis-plan.md
new file mode 100644
index 000000000..617533c07
--- /dev/null
+++ b/docs/book/assumptions/takeup-analysis-plan.md
@@ -0,0 +1,119 @@
+# Effective take-up rate analysis plan
+
+```{note}
+**Planning page.** Tracks
+[#1354](https://github.com/PolicyEngine/policyengine-uk/issues/1354),
+which asks for a comprehensive analysis of how the data processing
+pipeline (reweighting, SPI integration) changes effective take-up
+rates away from the seed values in `policyengine-uk-data`. This page
+documents the methodology and per-programme target list.
+```
+
+## Why this matters
+
+PolicyEngine UK doesn't compute benefit take-up from scratch — it
+**seeds** stochastic take-up flags (`would_claim_*`) per programme in
+`policyengine-uk-data` from prior research-published target rates,
+then runs them through:
+
+1. **Reported anchoring** (currently inconsistent — only UC and PC use
+   the input-only pattern; HB / IS / WTC / CTC still derive at
+   runtime; see [pipeline alignment plan](./uk-pipeline-alignment-plan.md)).
+2. **Reweighting** to match aggregate caseload and expenditure
+   targets.
+3. **SPI integration** for income-tax-relevant variables (which can
+   shift the income distribution and therefore who falls into each
+   benefit's eligible population).
+
+By the end of the pipeline the *effective* take-up rate — the share of
+the eligible population that ends up with `would_claim_X == True` — can
+diverge from the seed rate by a few percentage points. The seeded rates
+themselves were calibrated against pre-reweighted aggregates, so the
+real interest is how the post-pipeline shares stack up against
+published outturn share figures.
+
+## Programmes to cover
+
+Per #1354, with current PolicyEngine UK take-up handling status:
+
+| Programme | Variable | Take-up input | Source of seed rate | Status |
+|-----------|----------|----------------|---------------------|--------|
+| Universal Credit | `universal_credit` | `would_claim_uc` (input-only) | DWP UC official statistics | Modern pattern |
+| Pension Credit | `pension_credit` | `would_claim_pension_credit` (input-only) | DWP Pension Credit take-up tables | Modern pattern |
+| Housing Benefit | `housing_benefit` | `would_claim_housing_benefit` (formula-derived) | `gov.dwp.housing_benefit.takeup` | Legacy pattern (#1621 item 1) |
+| Income Support | `income_support` | `would_claim_IS` (formula-derived) | `gov.dwp.income_support.takeup` | Legacy pattern (#1621 item 1) |
+| Child Tax Credit | `child_tax_credit` | `would_claim_CTC` (formula-derived) | `gov.dwp.tax_credits.child_tax_credit.takeup` | Legacy pattern; scheme ended 2025-04-06 |
+| Working Tax Credit | `working_tax_credit` | `would_claim_WTC` (formula-derived) | `gov.dwp.tax_credits.working_tax_credit.takeup` | Legacy pattern; scheme ended 2025-04-06 |
+| Tax-Free Childcare | `tax_free_childcare` | `would_claim_tfc` | HMRC TFC statistics | – |
+| Child Benefit | `child_benefit` | `would_claim_child_benefit` | `gov.hmrc.child_benefit.takeup.overall` + by_age | – |
+| Marriage Allowance | `marriage_allowance` | `would_claim_marriage_allowance` | `gov.hmrc.income_tax.allowances.marriage_allowance.takeup_rate` (0.5 post-2019 steady state per #623) | – |
+| Bursary Fund 16-19 | `bursary_fund_16_to_19` | `would_claim_bursary_fund_16_to_19` | DfE bursary statistics | – |
+| Adult Dependants Grant | – | `would_claim_adult_dependants_grant` | SFE statistics | – |
+| Travel Grant | – | `would_claim_travel_grant` | SFE statistics | – |
+| Extended Childcare Entitlement | `extended_childcare_entitlement` | `would_claim_extended_childcare` | DfE take-up | – |
+| Targeted Childcare Entitlement | `targeted_childcare_entitlement` | `would_claim_targeted_childcare` | DfE take-up | – |
+| Scottish Child Payment | `scottish_child_payment` | `would_claim_scp` | Social Security Scotland statistics | – |
+| State Pension | `state_pension` | n/a (effectively universal) | – | Universal — see [state-pension.md](../programs/gov/dwp/state-pension.md) |
+| PIP / DLA / AA / SDA / Carer's Allowance | various | currently coded as "reported = paid" | DWP caseload | No explicit take-up variable today; covered in [disability-legacy-benefits.md](../programs/gov/dwp/disability-legacy-benefits.md) |
+| Council Tax Reduction | `council_tax_benefit` | (no formula; reported-only) | – | Tracked in #1669 — rules-based formula in flight |
+
+## Methodology
+
+For each programme above, the analysis should produce:
+
+1. **Seed take-up rate** — what the relevant `*_takeup_rate` parameter
+   reports for the analysis year, plus the source (HMRC / DWP /
+   academic study) and vintage.
+2. **Effective take-up rate post-pipeline** — counted from the built
+   dataset as `sum(would_claim_X * weight) / sum(eligible_X * weight)`,
+   where `eligible_X` is computed from the runtime formulas.
+3. **Published outturn share** for the same year — DWP / HMRC
+   statistical publications giving claimant counts and either eligible
+   population counts (where DWP estimates them) or a published take-up
+   rate.
+4. **Pipeline drift** — `effective − seed` per programme, with a sign
+   convention so positive means more claimants than seeded.
+
+The published reference rates change roughly annually. The analysis
+should be repeatable, ideally as a notebook under
+[`docs/book/validation/`](../validation/) so future builds can re-run
+it and update the gaps table.
+
+## Output
+
+A new validation notebook
+`docs/book/validation/take-up-rates.ipynb` should:
+
+- pull each programme's seed take-up parameter,
+- compute the effective rate against the built dataset,
+- compare to the most recent published share (DWP take-up tables for
+  income-related benefits, HMRC TFC + tax credits statistics for the
+  rest),
+- flag drifts of more than ±2 percentage points for follow-up.
+
+The first run is the deliverable for #1354. After that the notebook
+becomes a part of the validation page and gets re-run on each EFO /
+data refresh.
+
+## Related work
+
+- The **takeup-anchoring port** from `policyengine-us-data`
+  (`assign_takeup_with_reported_anchors`, see [pipeline alignment
+  plan](./uk-pipeline-alignment-plan.md)) is the most likely cause of
+  systematic drift between seed and effective for the legacy-pattern
+  programmes. Landing that port first would change the numbers this
+  analysis produces.
+- The **`would_claim_*` input-only conversion** (also in #1621) would
+  give the analysis a cleaner ladder: with all of `would_claim_*` as
+  inputs, the effective rate is exactly the share assigned by the
+  stochastic step in `frs.py`, and pipeline drift becomes a pure
+  reweighting question.
+
+## References
+
+- Issue: [#1354](https://github.com/PolicyEngine/policyengine-uk/issues/1354).
+- Initial analysis (UC + CTC): https://gist.github.com/MaxGhenis/763db9278ddecdf310f160a73e138c8a
+- DWP, [Income-related benefits: estimates of take-up](https://www.gov.uk/government/collections/income-related-benefits-estimates-of-take-up) — primary reference rates for HB / IS / PC / UC.
+- HMRC, [Personal tax credits statistics](https://www.gov.uk/government/collections/personal-tax-credits-statistics).
+- HMRC, [Child Benefit statistics](https://www.gov.uk/government/collections/child-benefit-statistics) and Marriage Allowance statistics.
+- Cross-links: [`uk-pipeline-alignment-plan.md`](./uk-pipeline-alignment-plan.md), [`state-pension.md`](../programs/gov/dwp/state-pension.md), [`disability-legacy-benefits.md`](../programs/gov/dwp/disability-legacy-benefits.md), [`tax-credits.md`](../programs/gov/dwp/tax-credits.md).