-
Notifications
You must be signed in to change notification settings - Fork 10
Add census block-level geographic assignment with comprehensive lookups #484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
baogorek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very clean and well-structured. Approved, but based on the scope of this PR, and the fact that tests are failing due to no fault of this PR, I think it makes sense to finish #473 first. I will need a review after I make a few more changes.
|
For my own reference: |
|
Let's add ZIP from block too |
- Assign census blocks using P(block|CD) from Census population data - Look up all geography from block GEOID for consistency: - County, tract, state (from GEOID structure) - CBSA/metro area (via NBER county crosswalk) - SLDU/SLDL (state legislative districts) - Place/City FIPS (via Census BAF) - PUMA (via tract crosswalk) - VTD (voting tabulation district) - Add block_crosswalk.csv.gz (8.1M blocks) with BAF data - Add make_block_crosswalk.py to generate crosswalk from Census BAFs - Add comprehensive tests for all geography lookups Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Updates stacked_dataset_builder.py to: - Set all geography variables from block assignment (block_geoid, tract_geoid, cbsa_code, sldu, sldl, place_fips, vtd, puma) - Include these variables in the saved h5 files These variables enable granular geographic analysis at multiple levels: state legislative districts, census tracts, metro areas, cities, etc. Requires policyengine-us#7249 for the variable definitions. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add ZCTA column to block_crosswalk.csv.gz from Census relationship file - Add get_zcta_from_block() function and include zcta in assign_geography_for_cd() - Save zcta to CD-stacked dataset output - Add tests for ZCTA lookup Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
ca7e692 to
0906da9
Compare
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Summary
Data files
block_cd_distributions.csv.gz(25MB): P(block|CD) from 2020 Census populations + 119th Congress BEFblock_crosswalk.csv.gz(21MB): 8.1M blocks mapped to SLDU, SLDL, Place, VTD, PUMA from Census BAFsTest plan
🤖 Generated with Claude Code