Skip to content

Add partner YAML tests and a dedicated CI runner#8273

Open
hua7450 wants to merge 15 commits into
PolicyEngine:mainfrom
hua7450:add-partner-yaml-tests
Open

Add partner YAML tests and a dedicated CI runner#8273
hua7450 wants to merge 15 commits into
PolicyEngine:mainfrom
hua7450:add-partner-yaml-tests

Conversation

@hua7450
Copy link
Copy Markdown
Collaborator

@hua7450 hua7450 commented May 12, 2026

Summary

Adds a dedicated, easily-debuggable test layer that fails CI when a PR would change calculation results for any of PolicyEngine's household API partners. Three batches under policyengine_us/tests/policy/baseline/partners/, plus a standalone CI runner so partner-affecting regressions show up as their own named check.

1. Customer fixture mirrors (original scope)

Mirrors policyengine-household-api customer fixtures (Amplifi, Impactica, MyFriendBen) — one folder per partner, one file per year.

2. analytics_coverage/signatures/ — production request coverage (81 cases)

One case per unique input-variable shape partners actually sent through the household API (sourced from unique_signatures.csv — 180 unique signatures collapse to ~83 unique input shapes after deduplicating household-size variants). Grouped by state into 8 files. Each retained case is annotated with how many production variants it consolidates (e.g. signature_3 (54 reqs, +12 dupes)).

3. analytics_coverage/edge_cases/ — boundary tests (642 cases)

Two-level structure mirroring PolicyEngine's own gov/ layout:

analytics_coverage/
├── signatures/      ← per-state signature cases (8 files, 81 cases)
└── edge_cases/
    ├── federal/
    │   ├── tax_credits/   (eitc, ctc, cdcc)
    │   ├── nutrition/     (snap, wic, csfp, school_meals)
    │   ├── healthcare/    (medicaid, chip, aca_ptc, msp)
    │   ├── childcare/     (ccdf, head_start, early_head_start)
    │   ├── cash/          (ssi, tanf)
    │   ├── housing/       (housing_assistance)
    │   ├── utility/       (lifeline)
    │   └── composition/   (cross-program: tax_credits, cash_health)
    └── state/
        ├── ca/  (18 programs: ala_general_assistance, calworks_*, eitc, yctc, tanf, …)
        ├── co/  (7)
        ├── il/  (11)
        ├── la/  (1: general_relief)
        ├── ma/  (8)
        ├── nc/  (2: scca, tanf)
        ├── tx/  (5: ccs, dart, fpp, harris_rides, tanf)
        └── wa/  (3: working_families_tax_credit, tanf, tax_credits_composition)

For each program, cases target the actual binding thresholds (income at FPL boundaries, age cutoffs, asset limits, immigration status, household composition).

State-varying programs (9-state coverage each):

  • SNAP (87 cases): per-state binding gross income limit (federal 130% FPL vs state BBCE), age 60 elderly boundary with assets between $3k/$4.5k, disability flip, immigration LPR/REFUGEE/UNDOCUMENTED (captures 2025-07 rule change), plus 6 size/composition variants in federal.yaml.
  • Medicaid / CHIP / ACA PTC / SSI / CCDF / housing / school_meals / Head Start / Early Head Start: state-varying eligibility (Medicaid expansion vs not, state copay schedules, FMR, regional rates) with state-invariant cases (federal MAGI categories, federal age cutoffs) deduplicated to federal.yaml.

Federal-only programs: eitc, cdcc, csfp, lifeline, wic, msp, composition/tax_credits — single federal.yaml, verified to produce identical outputs across all states.

State-specific programs: state TANF (8 states), CA tax credits + CalWORKs + general assistance + utility programs (16), CO state credits + supplements (5), IL state programs (9), MA state programs (6), NC SCCA, TX childcare + transit (4), WA WFTC.

What this catches

Verified by deliberately introducing each change locally and observing which suites fail:

Change Caught by
Variable removed / renamed without alias all 3 layers (simulation build fails)
Formula refactor that changes outputs edge_cases (sometimes signatures + fixtures, depends on household composition)
Enum member removal all 3 layers
value_type, definition_period, or entity change all 3 layers
Transitive removal (a non-partner variable that partner outputs depend on) all 3 layers via dependency graph

Known gaps

  • Parameter-value changes that don't bind on an existing test threshold can pass undetected. Edge cases today bind on eligibility boundaries (income at FPL cutoffs, age 60/65, $3k/$4.5k assets). They don't yet exercise:
    • Benefit floors (e.g., TX TANF minimum_grant)
    • Allotment caps when the cap doesn't bind in tested households
    • Intermediate deduction amounts (only top-line outputs are checked)
  • Follow-up: add "binds on parameter X" cases for each modified parameter type.

Implementation notes

  • Snapshot path matches CI: outputs generated via SimulationBuilder.build_from_dict with set_default_period(year) — the same path the YAML test runner uses.
  • absolute_error_margin: 0.1: tight enough that boolean eligibility variables (encoded as 0/1) can't silently false-positive across true / false, but loose enough to absorb float32 rounding noise.
  • county_str enum fix: prior versions had county_str: "DALLAS COUNTY" (human-readable). PolicyEngine silently ignored these because county_str has value_type=str — the actual county enum defaulted to ALAMEDA_COUNTY_CA regardless. Fixed: all county_str inputs converted to valid enum names (DALLAS_COUNTY_TX, ARAPAHOE_COUNTY_CO, etc.), and outputs recomputed so CalFresh, ACA PTC, and other county-dependent outputs match the actual county.
  • Three-stage redundancy trim: peak 1,642 → 866 (drop pure-federal state duplicates) → 642 edge cases + 81 signature cases. Cumulative ~60% reduction without losing meaningful coverage.
  • State-aware edge case thresholds: SNAP binding gross income limit pulls each state's BBCE multiplier dynamically (gov.hhs.tanf.non_cash.income_limit.gross[STATE]); state TANF income boundary found via per-state binary search.

CI integration

  • New dedicated job: Household API Partners (extracted from the Full Suite Baseline matrix into a standalone job in both pr.yaml and push.yaml).
  • Added to the Publish job's needs: in push.yaml so partner-test failures block release.

Test plan

  • All 81 analytics_coverage signature cases pass locally
  • All 642 edge_cases pass locally (4-way parallel run: 7:35 wall time)
  • 7 customer fixture YAMLs pass locally via make test-yaml-no-structural-other-partners
  • make format clean
  • Confirmed standalone Household API Partners runner picks up analytics_coverage/ and edge_cases/ under the new layout
  • CI passes on this PR

🤖 Generated with Claude Code

Mirror customer-household fixtures from policyengine-household-api
(Amplifi, Impactica, MyFriendBen) as PolicyEngine YAML tests under
policyengine_us/tests/policy/baseline/partners/. One folder per partner,
one file per year — multi-year fixtures split into separate cases.

Outputs are snapshot values built via the same SimulationBuilder path the
YAML test runner uses, so they match what CI checks. absolute_error_margin
is 0.1 to allow float32 noise without masking boolean false-vs-true flips.

CI: new `partners` matrix group in pr.yaml and push.yaml runs only this
folder, so PRs that affect partner-facing behavior show a dedicated check.
Excluded `partners` from the existing `rest` runner to avoid double-running.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.72%. Comparing base (8cb60e7) to head (eee0f91).
⚠️ Report is 19 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##              main    #8273       +/-   ##
============================================
- Coverage   100.00%   78.72%   -21.28%     
============================================
  Files            3     4723     +4720     
  Lines           63    68774    +68711     
  Branches         0      340      +340     
============================================
+ Hits            63    54142    +54079     
- Misses           0    14554    +14554     
- Partials         0       78       +78     
Flag Coverage Δ
unittests 78.72% <ø> (-21.28%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

hua7450 and others added 10 commits May 12, 2026 15:09
- Case B: ages 33/2/4, employment_income 36400, rent 6720, childcare 2400
- Case C: ages 25/3/6, employment_income 8840, rent 900, childcare 6000

Both snapshot through SimulationBuilder so values match the YAML runner.
Variables from older PE-US versions that no longer exist (e.g.
medical_out_of_pocket_expenses) are dropped during conversion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
177 per-signature test cases generated from production partner request
analytics (unique_signatures.csv). Each case mirrors a distinct
(inputs, outputs) combination that partners have sent, grouped into
per-state files under partners/analytics_coverage/. Inputs use a single
canonical 4-person family template; cases include only the variables
each signature actually listed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
227 boundary cases per state under partners/analytics_coverage/edge_cases/.
For each state file, generates 15 SNAP cases (9 dimension + 6 composition
variants) and 10-12 state TANF cases (4-6 dimension + 6 composition).

SNAP dimensions: income at the state-specific binding gross limit (federal
130% FPL vs BBCE multiplier, whichever binds), age 60 elderly boundary with
assets between $3k/$4.5k, disability flip, and immigration status variants
(LPR/REFUGEE/UNDOCUMENTED — captures the 2025-07-01 rule change).

State TANF dimensions: per-state dynamic income-threshold search (binary-search
the head employment_income that flips the state TANF benefit to zero) for
ca_tanf, co_tanf, il_tanf, ma_tafdc, nc_tanf, tx_tanf, wa_tanf, plus family
composition cases (no minor children, youngest age 17, only child 18).

Composition variants exercise size-1 (single adult / single elderly), size-2
(married couple no children / single parent + 1 child), size-3 (single parent
+ 2 children), and size-4 (couple + 2 children) households.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
576 additional cases across 8 federal programs, appended to existing
per-state edge_cases files. All 803 cases in the edge_cases folder pass
locally (29m27s).

Federal tax credits (per state, 26 cases each × 9 buckets = 234):
- EITC: phase-out start ($31,160 joint with kids 2026), investment income
  cap ($12,200), qualifying child age 18/19 boundary plus student carve-out,
  childless joint phase-out start ($18,140)
- CTC: qualifying child age 16/17, joint phase-out $400k, refundable
  phase-in floor $2,500
- CDCC: child age 12/13, AGI $15k phase-out start, 2026 amended joint
  second-phase-out at $150k, zero earned income

Federal cash + health (per state, 38 cases each × 9 buckets = 342):
- SSI: aged at 65, disability flip, resources at $2k limit, immigration
  variants (LPR / refugee / undocumented)
- MSP: Medicare prerequisite, income at QMB (100% FPL) / SLMB (120%) /
  QI (135%) tier boundaries
- Medicaid: age category boundaries (0→1 infant→young, 5→6 young→older,
  18→19 child→adult), pregnant adult, 138% FPL expansion boundary
- CHIP: age 18/19 cutoff, varied income levels for state-specific limits
- ACA PTC: 100% FPL minimum, 400% FPL hard cap return in 2026, employer
  ESHI offer flip

Plus 6 size/composition variants per program family per state for
size-1/2/3/4 household structure coverage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Restructure: edge_cases/{state}.yaml → edge_cases/{program}/{state}.yaml
across 64 program folders. Total 1,642 edge cases (up from 803).

New programs (~840 added cases):
- Federal nutrition + utilities: WIC, CSFP, school meals, Head Start,
  Early Head Start, Lifeline, housing assistance, CCDF
- State tax credits + supplements: ca_eitc, ca_yctc,
  ca_foster_youth_tax_credit, ca_cdcc, ca_renter_credit, ca_state_supplement,
  co_eitc_ctc, co_family_affordability_credit, co_care_worker_credit,
  co_state_supplement, co_oap, il_eitc, il_ctc, il_eitc_ctc, ma_eitc,
  ma_child_and_family_credit, ma_state_supplement,
  wa_working_families_tax_credit
- State utility + work-support: ca_care, ca_fera, ca_la_ez_save,
  ca_calworks_child_care, ca_calworks_stage_2/3, ca_capp,
  ca_ala_general_assistance, ca_riv_general_relief, ca_riv_share_payment,
  il_liheap, il_aabd, il_bcc, il_fpp, il_hbwd, il_mpe, ma_liheap, ma_eaedc,
  ma_mbta, nc_scca, tx_ccs, tx_dart, tx_fpp, tx_harris_rides,
  la_general_relief

Bug fix: county_str inputs across all partner YAMLs were using display
names ("DALLAS COUNTY", "ARAPAHOE COUNTY", etc.) instead of valid enum
names. PolicyEngine silently ignored these because county_str has
value_type=str — the actual county defaulted to ALAMEDA_COUNTY_CA
regardless. Converted all 133 affected files to use proper enum names
(DALLAS_COUNTY_TX, ARAPAHOE_COUNTY_CO, etc.) and recomputed all output
values so CalFresh / ACA PTC / other county-dependent outputs match the
actual county.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
These 7 folders had identical case content across all 9 state files
because the underlying calculation is purely federal and doesn't depend
on state_code:
  - eitc (-72)
  - cdcc (-40)
  - csfp (-80)
  - lifeline (-64)
  - wic (-96)
  - msp (-40)
  - fed_tax_credits_composition (-48)

Kept only federal.yaml in each — total edge cases down from 1,642 to 1,202.

Folders with genuine state variation are preserved (snap, medicaid,
aca_ptc, ssi, ccdf, housing_assistance, chip, ctc, school_meals,
head_start, early_head_start, fed_cash_health_composition).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Composition variants (size_1_single_adult, size_2_couple_no_children,
size_3_single_parent_2_children, etc.) test aggregation logic across
household structures — a code path that's identical regardless of
state_code. Keeping the same 6 composition cases × 9 states = 54 cases
per program × 7 programs = 378 cases is mechanical replication.

Trim: keep composition variants only in federal.yaml; remove from
ca/co/il/la/ma/nc/tx/wa state files for these 7 folders:
  - snap (-48)
  - ccdf (-48)
  - housing_assistance (-48)
  - school_meals (-48)
  - head_start (-48)
  - early_head_start (-48)
  - fed_cash_health_composition (-48)

State files in these folders still cover the dimension cases (income
thresholds, age cutoffs, immigration status) where state policy
actually affects the result.

Total edge cases: 1,202 -> 866.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two trims after analyzing redundancy:

1. Signature cases (analytics_coverage/*.yaml): 177 -> 81
   For each cluster of CSV signatures sharing the same set of input
   variable names (ignoring entity-count / household-size), keep one
   representative (the one with the largest output bundle) and drop
   the household-size duplicates. Each retained case is annotated
   with "+N dupes" showing how many production variants it consolidates.

2. Edge cases (analytics_coverage/edge_cases/): 866 -> 642
   For each multi-state edge_cases folder, find cases whose computed
   output is byte-identical across all 9 state files (federal +
   8 states). Those cases test state-invariant logic (federal MAGI,
   federal age cutoffs, federal credit phase-outs) — replication
   across 9 states adds nothing. Keep them in federal.yaml only;
   drop from ca/co/il/la/ma/nc/tx/wa.

   Affected folders (state-invariant cases dropped from 8 state files):
     - medicaid (6 cases × 8 states = 48)
     - ssi (5 × 8 = 40)
     - ctc (4 × 8 = 32)
     - ccdf (3 × 8 = 24)
     - aca_ptc (2 × 8 = 16)
     - housing_assistance (2 × 8 = 16)
     - school_meals (2 × 8 = 16)
     - head_start (2 × 8 = 16)
     - early_head_start (1 × 8 = 8)
     - chip (1 × 8 = 8)

   State-varying cases (where state actually changes the answer —
   SNAP BBCE, Medicaid expansion, etc.) stay at full 9-state coverage.

All 730 remaining partner tests pass locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the flat 64-folder edge_cases/ structure with a navigable two-level
layout that mirrors PolicyEngine's own (federal vs state) organization.
File contents and case names are unchanged — only paths move. All moves use
git mv so history is preserved.

New layout:
  edge_cases/
    federal/
      tax_credits/   (eitc, ctc, cdcc)
      nutrition/     (snap, wic, csfp, school_meals)
      healthcare/    (medicaid, chip, aca_ptc, msp)
      childcare/     (ccdf, head_start, early_head_start)
      cash/          (ssi, tanf)
      housing/       (housing_assistance)
      utility/       (lifeline)
      composition/   (tax_credits, cash_health)
    state/
      ca/, co/, il/, la/, ma/, nc/, tx/, wa/

State-specific programs lose their state prefix (ca_eitc/ca.yaml becomes
state/ca/eitc.yaml). TANF state files (tanf/{state}.yaml) move under each
state's folder. state_tax_credits_composition/ state files similarly move
to state/{state}/tax_credits_composition.yaml.

Also: extract the partner test runner from the Full Suite Baseline matrix
into a standalone CI job named "Household API Partners" (both pr.yaml and
push.yaml). Makes it obvious in PR status checks that this gate exists for
household API partner regressions, distinct from the general baseline suite.
Added HouseholdAPIPartners to the Publish job's `needs:` so partner-test
failures also block release.

⚠️ If "Full Suite - Baseline (partners)" was a required status check in
master branch protection, update the rule to "Household API Partners".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@hua7450 hua7450 marked this pull request as ready for review May 14, 2026 03:43
hua7450 and others added 4 commits May 13, 2026 23:44
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Symmetric structure with edge_cases/:
  analytics_coverage/
    ├── signatures/   ← per-state signature cases from production
    └── edge_cases/   ← hand-crafted boundary tests

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After the state-invariant cases trim, three folders had ALL their
non-federal state files reduced to '[]' (zero test cases):
  - federal/housing/housing_assistance/{ca,co,il,la,ma,nc,tx,wa}.yaml
  - federal/composition/cash_health/{ca,co,il,la,ma,nc,tx,wa}.yaml
  - federal/childcare/ccdf/{ca,co,il,la,ma,nc,tx,wa}.yaml

These folders now collapse to federal.yaml only, matching how truly
state-invariant programs already live (eitc, lifeline, msp, etc.).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@hua7450 hua7450 requested a review from PavelMakarchuk May 14, 2026 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant