Adding test (and variables) for sparse matrix building logic #475

juaristi22 · 2026-01-21T15:09:51Z

baogorek

Hi María,

Nice job on this so far. We should be able to get it merged before stand-up. Please see the comments sprinkled throughout. Here is the big one that could reduce the lines of code substantially:

  The TestNationalLevelContributions class (lines 466-656) tests something new and valuable - that national-level targets receive contributions from          
  households across multiple states without geographic bias. This should stay.                                                                                
                                                                                                                                                              
  However, TestStateLevelValues and TestCrossStateRecalculation substantially overlap with the existing test_same_state.py and test_cross_state.py. Could we  
  either:                                                                                                                                                     
  1. Remove the redundant test classes and rely on the existing tests, or                                                                                     
  2. If the new tests add coverage the existing tests miss (e.g., testing more variables), consolidate by extending the existing test files rather than       
  duplicating the approach?                                                                                                                                   
                                                                                                                                                              
  This would reduce the file from ~875 lines to ~300 lines (just configuration, fixtures, helpers, basic structure tests, and TestNationalLevelContributions).

Feel free to get a second opinion on that.

I was puzzled at first by the number of files in the PR but the black "2026 stable style" was released 4 days ago, so your PR gets to be the one to bring in those changes!

policyengine_us_data/tests/test_local_area_calibration/test_sparse_matrix_builder.py

policyengine_us_data/tests/test_local_area_calibration/test_matrix_national_variation.py

juaristi22 · 2026-01-22T09:15:12Z

Sorry I missed the other tests and fixtures. Thank you for pointing me to them! I have removed the redundant tests as you suggested and renamed my file to test_matrix_national_variations.py as the tests became more specific to that. I also took a look at the already existing test_same_state and test_cross_state and made them a bit more flexible so that they automatically consider any new variables we add to conftest.py rather than only checking for snap. Let me know if I missed anything. Otherwise, I think its ready to merge.

…esh calculations

baogorek

Nice work!

- Reduce VARIABLES_TO_TEST to 3 representative variables (snap, income_tax, eitc) - Reduce COMBINED_FILTER_CONFIG to minimal subset for fast CI runs - Reduce N_VERIFICATION_SAMPLES from 2000 to 500 - Revert test_cds to original 4 states (NC, HI, MT, AK) instead of 8 states Tests now complete in ~4 minutes instead of 3+ hours. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

baogorek · 2026-01-22T19:48:44Z

@juaristi22 the tests were running for 3+ hours so I had to pull this back some. Here's an explanation:

  The core issue is combinatorial explosion in the matrix_data fixture.        

  The SparseMatrixBuilder.build_matrix() method has this computational structure:                                                                             

  For each state (derived from CDs):   
      Create a new Microsimulation (expensive - loads dataset, initializes)    
      Delete all calculated variable arrays                                    
      For each CD in that state:       
          For each target row (one per variable × geographic level):           
              Evaluate constraints                                             
              Calculate variable values for all households                     
                                                                               
  Original config: 4 states × ~5 CDs/state × ~10 SNAP targets = ~200 iterations

  PR config: 8 states × ~3 CDs/state × ~hundreds of targets (30 variables × multiple geo levels) = ~thousands of iterations                                   

  The expensive operations are:        
  1. Creating Microsimulations - each state needs a fresh sim with state_fips set                                                                             
  2. sim.calculate(variable) - called for every target × CD combination        
  3. Constraint evaluation - also calls sim.calculate() for constraint variables                                                                              
                                       
  Going from 1 variable to 30 variables doesn't just multiply time by 30 - it multiplies the number of targets (rows), and each target requires constraint    
  evaluation and variable calculation.                                                                                                                        

  How to avoid this in the future                                                                                                                             
                                       
  1. Keep CI test configs minimal - test with 2-3 representative variables, not the full production list. The full list can live in documentation or a        
  separate "extended tests" workflow.  
  2. Be wary of module-scoped fixtures - the matrix_data fixture builds once per test module, so expanding it affects all tests in that module.               
  3. Consider test markers - use @pytest.mark.slow for expensive tests and skip them in CI by default, running them only on merge to main or nightly.         
  4. Watch the CD count - each additional state requires a new Microsimulation. The original 4 states was chosen deliberately for "manageable size."

juaristi22 added 2 commits January 21, 2026 20:38

adding test for matrix building logic

569c5b3

lint after updating black

fe70932

juaristi22 requested a review from baogorek January 21, 2026 16:13

baogorek requested changes Jan 22, 2026

View reviewed changes

baogorek mentioned this pull request Jan 22, 2026

Add health insurance premiums to local area calibration, a matrix builder function, Modal model fitting #473

Merged

3 tasks

remove redundant tests and fixtures

6534151

update test_same_test so it compares to original values instead of fr…

89ce2c8

…esh calculations

juaristi22 requested a review from baogorek January 22, 2026 14:42

baogorek approved these changes Jan 22, 2026

View reviewed changes

juaristi22 and others added 2 commits January 22, 2026 21:04

adding matrix builder improvements

b18680d

baogorek merged commit 5a402f5 into main Jan 22, 2026
7 checks passed

baogorek deleted the maria/sparse_matrix_tests branch January 22, 2026 20:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding test (and variables) for sparse matrix building logic #475

Adding test (and variables) for sparse matrix building logic #475

Uh oh!

juaristi22 commented Jan 21, 2026

Uh oh!

baogorek left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

juaristi22 commented Jan 22, 2026 •

edited

Loading

Uh oh!

baogorek left a comment

Uh oh!

baogorek commented Jan 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Adding test (and variables) for sparse matrix building logic #475

Adding test (and variables) for sparse matrix building logic #475

Uh oh!

Conversation

juaristi22 commented Jan 21, 2026

Uh oh!

baogorek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

juaristi22 commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

baogorek left a comment

Choose a reason for hiding this comment

Uh oh!

baogorek commented Jan 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

juaristi22 commented Jan 22, 2026 •

edited

Loading