Skip to content

Commit ca7e692

Browse files
MaxGhenisclaude
andcommitted
Save all geography variables to CD-stacked datasets
Updates stacked_dataset_builder.py to: - Set all geography variables from block assignment (block_geoid, tract_geoid, cbsa_code, sldu, sldl, place_fips, vtd, puma) - Include these variables in the saved h5 files These variables enable granular geographic analysis at multiple levels: state legislative districts, census tracts, metro areas, cities, etc. Requires policyengine-us#7249 for the variable definitions. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent f948630 commit ca7e692

2 files changed

Lines changed: 22 additions & 2 deletions

File tree

changelog_entry.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
- bump: minor
22
changes:
33
added:
4-
- Census block-level geographic assignment for households
5-
- Comprehensive geography lookups from block GEOID (SLDU, SLDL, Place, VTD, PUMA)
4+
- Census block-level geographic assignment for households in CD-stacked datasets
5+
- Comprehensive geography variables in output (block_geoid, tract_geoid, cbsa_code, sldu, sldl, place_fips, vtd, puma)
66
- Block crosswalk file mapping 8.1M blocks to all Census geographies
77
- Block-to-CD distribution file for population-weighted assignment

policyengine_us_data/datasets/cps/local_area_calibration/stacked_dataset_builder.py

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -362,6 +362,16 @@ def create_sparse_cd_stacked_dataset(
362362
# Set county using indices for backwards compatibility with PolicyEngine-US
363363
cd_sim.set_input("county", time_period, geography["county_index"])
364364

365+
# Set all other geography variables from block assignment
366+
cd_sim.set_input("block_geoid", time_period, geography["block_geoid"])
367+
cd_sim.set_input("tract_geoid", time_period, geography["tract_geoid"])
368+
cd_sim.set_input("cbsa_code", time_period, geography["cbsa_code"])
369+
cd_sim.set_input("sldu", time_period, geography["sldu"])
370+
cd_sim.set_input("sldl", time_period, geography["sldl"])
371+
cd_sim.set_input("place_fips", time_period, geography["place_fips"])
372+
cd_sim.set_input("vtd", time_period, geography["vtd"])
373+
cd_sim.set_input("puma", time_period, geography["puma"])
374+
365375
# Note: We no longer use binary filtering for county_filter.
366376
# Instead, weights are scaled by P(target|CD) and all households
367377
# are included to avoid sample selection bias.
@@ -636,6 +646,16 @@ def create_sparse_cd_stacked_dataset(
636646
# spm_unit_spm_threshold is recalculated with CD-specific geo-adjustment
637647
vars_to_save.add("spm_unit_spm_threshold")
638648

649+
# Add all geography variables set during block assignment
650+
vars_to_save.add("block_geoid")
651+
vars_to_save.add("tract_geoid")
652+
vars_to_save.add("cbsa_code")
653+
vars_to_save.add("sldu")
654+
vars_to_save.add("sldl")
655+
vars_to_save.add("place_fips")
656+
vars_to_save.add("vtd")
657+
vars_to_save.add("puma")
658+
639659
variables_saved = 0
640660
variables_skipped = 0
641661

0 commit comments

Comments
 (0)