Add partner YAML tests and a dedicated CI runner#8273
Open
hua7450 wants to merge 15 commits into
Open
Conversation
Mirror customer-household fixtures from policyengine-household-api (Amplifi, Impactica, MyFriendBen) as PolicyEngine YAML tests under policyengine_us/tests/policy/baseline/partners/. One folder per partner, one file per year — multi-year fixtures split into separate cases. Outputs are snapshot values built via the same SimulationBuilder path the YAML test runner uses, so they match what CI checks. absolute_error_margin is 0.1 to allow float32 noise without masking boolean false-vs-true flips. CI: new `partners` matrix group in pr.yaml and push.yaml runs only this folder, so PRs that affect partner-facing behavior show a dedicated check. Excluded `partners` from the existing `rest` runner to avoid double-running. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #8273 +/- ##
============================================
- Coverage 100.00% 78.72% -21.28%
============================================
Files 3 4723 +4720
Lines 63 68774 +68711
Branches 0 340 +340
============================================
+ Hits 63 54142 +54079
- Misses 0 14554 +14554
- Partials 0 78 +78
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
- Case B: ages 33/2/4, employment_income 36400, rent 6720, childcare 2400 - Case C: ages 25/3/6, employment_income 8840, rent 900, childcare 6000 Both snapshot through SimulationBuilder so values match the YAML runner. Variables from older PE-US versions that no longer exist (e.g. medical_out_of_pocket_expenses) are dropped during conversion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
177 per-signature test cases generated from production partner request analytics (unique_signatures.csv). Each case mirrors a distinct (inputs, outputs) combination that partners have sent, grouped into per-state files under partners/analytics_coverage/. Inputs use a single canonical 4-person family template; cases include only the variables each signature actually listed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…into add-partner-yaml-tests
227 boundary cases per state under partners/analytics_coverage/edge_cases/. For each state file, generates 15 SNAP cases (9 dimension + 6 composition variants) and 10-12 state TANF cases (4-6 dimension + 6 composition). SNAP dimensions: income at the state-specific binding gross limit (federal 130% FPL vs BBCE multiplier, whichever binds), age 60 elderly boundary with assets between $3k/$4.5k, disability flip, and immigration status variants (LPR/REFUGEE/UNDOCUMENTED — captures the 2025-07-01 rule change). State TANF dimensions: per-state dynamic income-threshold search (binary-search the head employment_income that flips the state TANF benefit to zero) for ca_tanf, co_tanf, il_tanf, ma_tafdc, nc_tanf, tx_tanf, wa_tanf, plus family composition cases (no minor children, youngest age 17, only child 18). Composition variants exercise size-1 (single adult / single elderly), size-2 (married couple no children / single parent + 1 child), size-3 (single parent + 2 children), and size-4 (couple + 2 children) households. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
576 additional cases across 8 federal programs, appended to existing per-state edge_cases files. All 803 cases in the edge_cases folder pass locally (29m27s). Federal tax credits (per state, 26 cases each × 9 buckets = 234): - EITC: phase-out start ($31,160 joint with kids 2026), investment income cap ($12,200), qualifying child age 18/19 boundary plus student carve-out, childless joint phase-out start ($18,140) - CTC: qualifying child age 16/17, joint phase-out $400k, refundable phase-in floor $2,500 - CDCC: child age 12/13, AGI $15k phase-out start, 2026 amended joint second-phase-out at $150k, zero earned income Federal cash + health (per state, 38 cases each × 9 buckets = 342): - SSI: aged at 65, disability flip, resources at $2k limit, immigration variants (LPR / refugee / undocumented) - MSP: Medicare prerequisite, income at QMB (100% FPL) / SLMB (120%) / QI (135%) tier boundaries - Medicaid: age category boundaries (0→1 infant→young, 5→6 young→older, 18→19 child→adult), pregnant adult, 138% FPL expansion boundary - CHIP: age 18/19 cutoff, varied income levels for state-specific limits - ACA PTC: 100% FPL minimum, 400% FPL hard cap return in 2026, employer ESHI offer flip Plus 6 size/composition variants per program family per state for size-1/2/3/4 household structure coverage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Restructure: edge_cases/{state}.yaml → edge_cases/{program}/{state}.yaml
across 64 program folders. Total 1,642 edge cases (up from 803).
New programs (~840 added cases):
- Federal nutrition + utilities: WIC, CSFP, school meals, Head Start,
Early Head Start, Lifeline, housing assistance, CCDF
- State tax credits + supplements: ca_eitc, ca_yctc,
ca_foster_youth_tax_credit, ca_cdcc, ca_renter_credit, ca_state_supplement,
co_eitc_ctc, co_family_affordability_credit, co_care_worker_credit,
co_state_supplement, co_oap, il_eitc, il_ctc, il_eitc_ctc, ma_eitc,
ma_child_and_family_credit, ma_state_supplement,
wa_working_families_tax_credit
- State utility + work-support: ca_care, ca_fera, ca_la_ez_save,
ca_calworks_child_care, ca_calworks_stage_2/3, ca_capp,
ca_ala_general_assistance, ca_riv_general_relief, ca_riv_share_payment,
il_liheap, il_aabd, il_bcc, il_fpp, il_hbwd, il_mpe, ma_liheap, ma_eaedc,
ma_mbta, nc_scca, tx_ccs, tx_dart, tx_fpp, tx_harris_rides,
la_general_relief
Bug fix: county_str inputs across all partner YAMLs were using display
names ("DALLAS COUNTY", "ARAPAHOE COUNTY", etc.) instead of valid enum
names. PolicyEngine silently ignored these because county_str has
value_type=str — the actual county defaulted to ALAMEDA_COUNTY_CA
regardless. Converted all 133 affected files to use proper enum names
(DALLAS_COUNTY_TX, ARAPAHOE_COUNTY_CO, etc.) and recomputed all output
values so CalFresh / ACA PTC / other county-dependent outputs match the
actual county.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
These 7 folders had identical case content across all 9 state files because the underlying calculation is purely federal and doesn't depend on state_code: - eitc (-72) - cdcc (-40) - csfp (-80) - lifeline (-64) - wic (-96) - msp (-40) - fed_tax_credits_composition (-48) Kept only federal.yaml in each — total edge cases down from 1,642 to 1,202. Folders with genuine state variation are preserved (snap, medicaid, aca_ptc, ssi, ccdf, housing_assistance, chip, ctc, school_meals, head_start, early_head_start, fed_cash_health_composition). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Composition variants (size_1_single_adult, size_2_couple_no_children, size_3_single_parent_2_children, etc.) test aggregation logic across household structures — a code path that's identical regardless of state_code. Keeping the same 6 composition cases × 9 states = 54 cases per program × 7 programs = 378 cases is mechanical replication. Trim: keep composition variants only in federal.yaml; remove from ca/co/il/la/ma/nc/tx/wa state files for these 7 folders: - snap (-48) - ccdf (-48) - housing_assistance (-48) - school_meals (-48) - head_start (-48) - early_head_start (-48) - fed_cash_health_composition (-48) State files in these folders still cover the dimension cases (income thresholds, age cutoffs, immigration status) where state policy actually affects the result. Total edge cases: 1,202 -> 866. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two trims after analyzing redundancy:
1. Signature cases (analytics_coverage/*.yaml): 177 -> 81
For each cluster of CSV signatures sharing the same set of input
variable names (ignoring entity-count / household-size), keep one
representative (the one with the largest output bundle) and drop
the household-size duplicates. Each retained case is annotated
with "+N dupes" showing how many production variants it consolidates.
2. Edge cases (analytics_coverage/edge_cases/): 866 -> 642
For each multi-state edge_cases folder, find cases whose computed
output is byte-identical across all 9 state files (federal +
8 states). Those cases test state-invariant logic (federal MAGI,
federal age cutoffs, federal credit phase-outs) — replication
across 9 states adds nothing. Keep them in federal.yaml only;
drop from ca/co/il/la/ma/nc/tx/wa.
Affected folders (state-invariant cases dropped from 8 state files):
- medicaid (6 cases × 8 states = 48)
- ssi (5 × 8 = 40)
- ctc (4 × 8 = 32)
- ccdf (3 × 8 = 24)
- aca_ptc (2 × 8 = 16)
- housing_assistance (2 × 8 = 16)
- school_meals (2 × 8 = 16)
- head_start (2 × 8 = 16)
- early_head_start (1 × 8 = 8)
- chip (1 × 8 = 8)
State-varying cases (where state actually changes the answer —
SNAP BBCE, Medicaid expansion, etc.) stay at full 9-state coverage.
All 730 remaining partner tests pass locally.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the flat 64-folder edge_cases/ structure with a navigable two-level
layout that mirrors PolicyEngine's own (federal vs state) organization.
File contents and case names are unchanged — only paths move. All moves use
git mv so history is preserved.
New layout:
edge_cases/
federal/
tax_credits/ (eitc, ctc, cdcc)
nutrition/ (snap, wic, csfp, school_meals)
healthcare/ (medicaid, chip, aca_ptc, msp)
childcare/ (ccdf, head_start, early_head_start)
cash/ (ssi, tanf)
housing/ (housing_assistance)
utility/ (lifeline)
composition/ (tax_credits, cash_health)
state/
ca/, co/, il/, la/, ma/, nc/, tx/, wa/
State-specific programs lose their state prefix (ca_eitc/ca.yaml becomes
state/ca/eitc.yaml). TANF state files (tanf/{state}.yaml) move under each
state's folder. state_tax_credits_composition/ state files similarly move
to state/{state}/tax_credits_composition.yaml.
Also: extract the partner test runner from the Full Suite Baseline matrix
into a standalone CI job named "Household API Partners" (both pr.yaml and
push.yaml). Makes it obvious in PR status checks that this gate exists for
household API partner regressions, distinct from the general baseline suite.
Added HouseholdAPIPartners to the Publish job's `needs:` so partner-test
failures also block release.
⚠️ If "Full Suite - Baseline (partners)" was a required status check in
master branch protection, update the rule to "Household API Partners".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Symmetric structure with edge_cases/:
analytics_coverage/
├── signatures/ ← per-state signature cases from production
└── edge_cases/ ← hand-crafted boundary tests
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After the state-invariant cases trim, three folders had ALL their
non-federal state files reduced to '[]' (zero test cases):
- federal/housing/housing_assistance/{ca,co,il,la,ma,nc,tx,wa}.yaml
- federal/composition/cash_health/{ca,co,il,la,ma,nc,tx,wa}.yaml
- federal/childcare/ccdf/{ca,co,il,la,ma,nc,tx,wa}.yaml
These folders now collapse to federal.yaml only, matching how truly
state-invariant programs already live (eitc, lifeline, msp, etc.).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a dedicated, easily-debuggable test layer that fails CI when a PR would change calculation results for any of PolicyEngine's household API partners. Three batches under
policyengine_us/tests/policy/baseline/partners/, plus a standalone CI runner so partner-affecting regressions show up as their own named check.1. Customer fixture mirrors (original scope)
Mirrors
policyengine-household-apicustomer fixtures (Amplifi, Impactica, MyFriendBen) — one folder per partner, one file per year.2.
analytics_coverage/signatures/— production request coverage (81 cases)One case per unique input-variable shape partners actually sent through the household API (sourced from
unique_signatures.csv— 180 unique signatures collapse to ~83 unique input shapes after deduplicating household-size variants). Grouped by state into 8 files. Each retained case is annotated with how many production variants it consolidates (e.g.signature_3 (54 reqs, +12 dupes)).3.
analytics_coverage/edge_cases/— boundary tests (642 cases)Two-level structure mirroring PolicyEngine's own
gov/layout:For each program, cases target the actual binding thresholds (income at FPL boundaries, age cutoffs, asset limits, immigration status, household composition).
State-varying programs (9-state coverage each):
Federal-only programs:
eitc,cdcc,csfp,lifeline,wic,msp,composition/tax_credits— singlefederal.yaml, verified to produce identical outputs across all states.State-specific programs: state TANF (8 states), CA tax credits + CalWORKs + general assistance + utility programs (16), CO state credits + supplements (5), IL state programs (9), MA state programs (6), NC SCCA, TX childcare + transit (4), WA WFTC.
What this catches
Verified by deliberately introducing each change locally and observing which suites fail:
value_type,definition_period, orentitychangeKnown gaps
minimum_grant)Implementation notes
SimulationBuilder.build_from_dictwithset_default_period(year)— the same path the YAML test runner uses.absolute_error_margin: 0.1: tight enough that boolean eligibility variables (encoded as 0/1) can't silently false-positive acrosstrue/false, but loose enough to absorb float32 rounding noise.county_strenum fix: prior versions hadcounty_str: "DALLAS COUNTY"(human-readable). PolicyEngine silently ignored these becausecounty_strhasvalue_type=str— the actualcountyenum defaulted toALAMEDA_COUNTY_CAregardless. Fixed: allcounty_strinputs converted to valid enum names (DALLAS_COUNTY_TX,ARAPAHOE_COUNTY_CO, etc.), and outputs recomputed so CalFresh, ACA PTC, and other county-dependent outputs match the actual county.gov.hhs.tanf.non_cash.income_limit.gross[STATE]); state TANF income boundary found via per-state binary search.CI integration
Household API Partners(extracted from the Full Suite Baseline matrix into a standalone job in bothpr.yamlandpush.yaml).Publishjob'sneeds:inpush.yamlso partner-test failures block release.Test plan
make test-yaml-no-structural-other-partnersmake formatcleanHousehold API Partnersrunner picks upanalytics_coverage/andedge_cases/under the new layout🤖 Generated with Claude Code