Skip to content

Clarify or deprecate generic state_taxable_income and move TAXSIM v36 semantics to the right layer #7897

@MaxGhenis

Description

@MaxGhenis

Summary

state_taxable_income in policyengine-us does not look safe as a generic cross-state public variable.

At the moment it is:

  • effectively unused inside core policyengine-us
  • incomplete relative to actual modeled state tax bases
  • coupled to policyengine-taxsim, which maps TAXSIM v36 to state_taxable_income
  • misleading for downstream benchmarking/analysis because it silently returns 0 in some states that clearly have modeled state tax base logic

This looks less like a bug in state tax calculations themselves and more like an interface/ownership problem around the umbrella variable.

PE-US findings

1. state_taxable_income is just a hand-maintained umbrella list

The variable itself is only:

  • policyengine_us/variables/gov/states/tax/income/state_taxable_income.py
class state_taxable_income(Variable):
    ...
    adds = "gov.states.household.state_taxable_incomes"

The real behavior comes from:

  • policyengine_us/parameters/gov/states/household/state_taxable_incomes.yaml

2. The umbrella list is incomplete relative to actual state variables

The clearest concrete bug is New Hampshire:

  • policyengine_us/variables/gov/states/nh/tax/income/nh_taxable_income.py exists
  • policyengine_us/variables/gov/states/nh/tax/income/nh_income_tax_before_refundable_credits.py directly uses nh_taxable_income
  • but nh_taxable_income is omitted from state_taxable_incomes.yaml

So the generic state_taxable_income umbrella is definitely missing at least one real modeled taxable-income variable.

Pennsylvania also looks wrong/inconsistent:

  • policyengine_us/variables/gov/states/pa/tax/income/taxable_income/pa_total_taxable_income.py exists
  • policyengine_us/variables/gov/states/pa/tax/income/taxable_income/pa_adjusted_taxable_income.py exists
  • policyengine_us/variables/gov/states/pa/tax/income/forgiveness/pa_income_tax_before_forgiveness.py uses pa_adjusted_taxable_income
  • but neither PA taxable-income variable is included in state_taxable_incomes.yaml

That means the generic umbrella currently reports 0 for PA even though PE-US clearly has state taxable-income concepts and uses them in the tax path.

3. Some omissions are probably legitimate

Massachusetts looks like an intentional omission:

  • state_taxable_incomes.yaml explicitly comments that MA has multiple taxable income variables
  • policyengine_us/variables/gov/states/ma/tax/income/ma_income_tax_before_credits.py taxes multiple bases (ma_part_a_taxable_dividend_income, ma_part_a_taxable_capital_gains_income, ma_part_b_taxable_income, ma_part_c_taxable_income)

Washington also seems legitimate:

  • policyengine_us/variables/gov/states/wa/tax/income/wa_income_tax_before_refundable_credits.py is just wa_capital_gains_tax
  • there may not be a coherent generic WA "taxable income" concept to expose

So this is not just "add every omitted state".

4. The core state tax path does not use state_taxable_income

I could not find any internal PE-US use of state_taxable_income besides its own definition.

By contrast, the actual cross-state tax aggregators are:

  • policyengine_us/parameters/gov/states/household/state_income_tax_before_refundable_credits.yaml
  • policyengine_us/parameters/gov/states/household/state_refundable_credits.yaml

and final state tax is built in:

  • policyengine_us/variables/household/income/household/household_state_income_tax.py

So state_taxable_income does not appear to be part of core PE-US tax computation.

policyengine-taxsim coupling

policyengine-taxsim currently depends on this variable for TAXSIM output v36:

  • policyengine_taxsim/config/variable_mappings.yaml
    • v36 -> state_taxable_income

It is also surfaced in docs/UI:

  • dashboard/src/constants/index.js
  • README.md

There is also an older policyengine-taxsim emulator that assumes a "{state}_taxable_income" naming convention:

  • policyengine-taxsim/taxsim_emulator.py

So the current situation is:

  • PE-US does not depend on state_taxable_income
  • TAXSIM compatibility code does

Downstream / benchmarking impact

This came up while investigating state-tax targets in PolicyBench.

Empirically, the variable is misleading in current outputs:

  • in one 100-household sample, there were 23 households where state_taxable_income == 0 while state_income_tax_before_refundable_credits != 0
  • every sampled PA household had state_taxable_income = 0 while pre-credit state tax was nonzero
  • every sampled MA household had state_taxable_income = 0 while pre-credit state tax was nonzero

That makes state_taxable_income hard to interpret as a cross-state benchmark target.

Suggested decision / audit

I think this needs an explicit decision rather than a small patch.

Questions to resolve:

  1. Should state_taxable_income remain a public generic PE-US variable at all?
  2. If yes, what is its intended semantics for states with:
    • multiple taxable-income bases (MA)
    • special tax bases (WA capital gains)
    • state-specific adjusted taxable income paths (PA)
  3. If no, should TAXSIM v36 logic live in policyengine-taxsim instead as a dedicated compatibility adapter, e.g. taxsim_v36 / taxsim_state_taxable_income?

Strategy options

Option A: Keep state_taxable_income, but audit and define it properly

  • Fix clear omissions like NH
  • Decide what PA should map to
  • Document which states intentionally return 0 / are undefined
  • Clarify semantics for states with multiple or nonstandard tax bases

Option B: Deprecate/remove state_taxable_income from PE-US as a universal concept

  • Treat it as not a real cross-state PE-US variable
  • Move TAXSIM v36 semantics into policyengine-taxsim
  • Keep PE-US focused on real policy concepts, not TAXSIM compatibility abstractions

Option C: Split concepts

  • Keep PE-US state-specific variables only
  • Add a separate adapter-level variable for TAXSIM / comparison workflows

My current lean

I lean toward B or C, not A.

Reason:

  • there is already at least one objective omission bug (NH)
  • but some omissions are legitimate because the concept itself is not universal
  • PE-US core logic does not need this variable
  • policyengine-taxsim appears to be the only meaningful consumer

So the cleanest architecture may be:

  • deprecate state_taxable_income as a generic PE-US variable
  • implement TAXSIM v36 semantics explicitly in policyengine-taxsim
  • only keep/add PE-US public variables that correspond to actual cross-state policy concepts

If maintainers prefer, this can be split into:

  • one PE-US issue for state_taxable_income
  • one policyengine-taxsim issue for v36 ownership/mapping

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions