-
Notifications
You must be signed in to change notification settings - Fork 206
Clarify or deprecate generic state_taxable_income and move TAXSIM v36 semantics to the right layer #7897
Description
Summary
state_taxable_income in policyengine-us does not look safe as a generic cross-state public variable.
At the moment it is:
- effectively unused inside core
policyengine-us - incomplete relative to actual modeled state tax bases
- coupled to
policyengine-taxsim, which maps TAXSIMv36tostate_taxable_income - misleading for downstream benchmarking/analysis because it silently returns
0in some states that clearly have modeled state tax base logic
This looks less like a bug in state tax calculations themselves and more like an interface/ownership problem around the umbrella variable.
PE-US findings
1. state_taxable_income is just a hand-maintained umbrella list
The variable itself is only:
policyengine_us/variables/gov/states/tax/income/state_taxable_income.py
class state_taxable_income(Variable):
...
adds = "gov.states.household.state_taxable_incomes"The real behavior comes from:
policyengine_us/parameters/gov/states/household/state_taxable_incomes.yaml
2. The umbrella list is incomplete relative to actual state variables
The clearest concrete bug is New Hampshire:
policyengine_us/variables/gov/states/nh/tax/income/nh_taxable_income.pyexistspolicyengine_us/variables/gov/states/nh/tax/income/nh_income_tax_before_refundable_credits.pydirectly usesnh_taxable_income- but
nh_taxable_incomeis omitted fromstate_taxable_incomes.yaml
So the generic state_taxable_income umbrella is definitely missing at least one real modeled taxable-income variable.
Pennsylvania also looks wrong/inconsistent:
policyengine_us/variables/gov/states/pa/tax/income/taxable_income/pa_total_taxable_income.pyexistspolicyengine_us/variables/gov/states/pa/tax/income/taxable_income/pa_adjusted_taxable_income.pyexistspolicyengine_us/variables/gov/states/pa/tax/income/forgiveness/pa_income_tax_before_forgiveness.pyusespa_adjusted_taxable_income- but neither PA taxable-income variable is included in
state_taxable_incomes.yaml
That means the generic umbrella currently reports 0 for PA even though PE-US clearly has state taxable-income concepts and uses them in the tax path.
3. Some omissions are probably legitimate
Massachusetts looks like an intentional omission:
state_taxable_incomes.yamlexplicitly comments that MA has multiple taxable income variablespolicyengine_us/variables/gov/states/ma/tax/income/ma_income_tax_before_credits.pytaxes multiple bases (ma_part_a_taxable_dividend_income,ma_part_a_taxable_capital_gains_income,ma_part_b_taxable_income,ma_part_c_taxable_income)
Washington also seems legitimate:
policyengine_us/variables/gov/states/wa/tax/income/wa_income_tax_before_refundable_credits.pyis justwa_capital_gains_tax- there may not be a coherent generic WA "taxable income" concept to expose
So this is not just "add every omitted state".
4. The core state tax path does not use state_taxable_income
I could not find any internal PE-US use of state_taxable_income besides its own definition.
By contrast, the actual cross-state tax aggregators are:
policyengine_us/parameters/gov/states/household/state_income_tax_before_refundable_credits.yamlpolicyengine_us/parameters/gov/states/household/state_refundable_credits.yaml
and final state tax is built in:
policyengine_us/variables/household/income/household/household_state_income_tax.py
So state_taxable_income does not appear to be part of core PE-US tax computation.
policyengine-taxsim coupling
policyengine-taxsim currently depends on this variable for TAXSIM output v36:
policyengine_taxsim/config/variable_mappings.yamlv36 -> state_taxable_income
It is also surfaced in docs/UI:
dashboard/src/constants/index.jsREADME.md
There is also an older policyengine-taxsim emulator that assumes a "{state}_taxable_income" naming convention:
policyengine-taxsim/taxsim_emulator.py
So the current situation is:
- PE-US does not depend on
state_taxable_income - TAXSIM compatibility code does
Downstream / benchmarking impact
This came up while investigating state-tax targets in PolicyBench.
Empirically, the variable is misleading in current outputs:
- in one 100-household sample, there were
23households wherestate_taxable_income == 0whilestate_income_tax_before_refundable_credits != 0 - every sampled PA household had
state_taxable_income = 0while pre-credit state tax was nonzero - every sampled MA household had
state_taxable_income = 0while pre-credit state tax was nonzero
That makes state_taxable_income hard to interpret as a cross-state benchmark target.
Suggested decision / audit
I think this needs an explicit decision rather than a small patch.
Questions to resolve:
- Should
state_taxable_incomeremain a public generic PE-US variable at all? - If yes, what is its intended semantics for states with:
- multiple taxable-income bases (MA)
- special tax bases (WA capital gains)
- state-specific adjusted taxable income paths (PA)
- If no, should TAXSIM
v36logic live inpolicyengine-taxsiminstead as a dedicated compatibility adapter, e.g.taxsim_v36/taxsim_state_taxable_income?
Strategy options
Option A: Keep state_taxable_income, but audit and define it properly
- Fix clear omissions like NH
- Decide what PA should map to
- Document which states intentionally return
0/ are undefined - Clarify semantics for states with multiple or nonstandard tax bases
Option B: Deprecate/remove state_taxable_income from PE-US as a universal concept
- Treat it as not a real cross-state PE-US variable
- Move TAXSIM
v36semantics intopolicyengine-taxsim - Keep PE-US focused on real policy concepts, not TAXSIM compatibility abstractions
Option C: Split concepts
- Keep PE-US state-specific variables only
- Add a separate adapter-level variable for TAXSIM / comparison workflows
My current lean
I lean toward B or C, not A.
Reason:
- there is already at least one objective omission bug (NH)
- but some omissions are legitimate because the concept itself is not universal
- PE-US core logic does not need this variable
policyengine-taxsimappears to be the only meaningful consumer
So the cleanest architecture may be:
- deprecate
state_taxable_incomeas a generic PE-US variable - implement TAXSIM
v36semantics explicitly inpolicyengine-taxsim - only keep/add PE-US public variables that correspond to actual cross-state policy concepts
If maintainers prefer, this can be split into:
- one PE-US issue for
state_taxable_income - one
policyengine-taxsimissue forv36ownership/mapping