Survey Data Support: History and Current State

This document is the technical reference for survey-design support in diff-diff. It records the build history (Phases 1-10) as shipped and documents current limitations. Forward-looking roadmap items live in ROADMAP.md; this file is the historical and technical companion.

What's Shipped

Phases 1-2: Core Infrastructure

SurveyDesign class with weights, strata, PSU, FPC, weight_type, nest, lonely_psu
Taylor Series Linearization (TSL) variance with strata + PSU + FPC
Weighted OLS, sandwich estimator, demeaning, survey degrees of freedom
SurveyMetadata on results (effective n, DEFF, weight_range)
Base estimators: DifferenceInDifferences, TwoWayFixedEffects, MultiPeriodDiD

Phase 3: OLS-Based Standalone Estimators

Estimator	Survey Support	Notes
StackedDiD	pweight only	Q-weights compose multiplicatively; fweight/aweight rejected
SunAbraham	Full	Bootstrap via Rao-Wu rescaled
BaconDecomposition	Diagnostic	Weighted descriptives only, no inference
TripleDifference	Full	Regression, IPW, and DR methods with TSL on IFs
ContinuousDiD	Full	Weighted B-spline OLS + TSL; bootstrap via multiplier at PSU
EfficientDiD	Full	No-cov and DR covariate paths both survey-weighted; bootstrap via multiplier at PSU

Phase 4: Complex Estimators + Weighted Logit

Estimator	Survey Support	Notes
ImputationDiD	Full	Weighted iterative FE + conservative variance; bootstrap via multiplier at PSU
TwoStageDiD	Full	Weighted FE + GMM sandwich; bootstrap via multiplier at PSU
CallawaySantAnna	Full	Strata/PSU/FPC/replicate weights; IPW/DR covariates (Phase 7a); replicate IF variance

Weighted solve_logit() in linalg.py — survey weights enter IRLS as w_survey * mu * (1 - mu).

Phase 5: SyntheticDiD + TROP

Estimator	Survey Support	Notes
SyntheticDiD	pweight	Treated means survey-weighted; omega composed with control weights post-optimization
TROP	pweight	Population-weighted ATT aggregation; model fitting unchanged

Phase 6: Advanced Features (v2.7.6)

Survey-aware bootstrap for all 8 bootstrap-using estimators: multiplier at PSU (CS, Imputation, TwoStage, Continuous, Efficient) and Rao-Wu rescaled (SA, SyntheticDiD, TROP)
Replicate weight variance: BRR, Fay's BRR, JK1, JKn, SDR. 12 of 16 estimators supported (not SyntheticDiD, TROP, BaconDecomposition, or WooldridgeDiD)
DEFF diagnostics: per-coefficient design effects vs SRS baseline
Subpopulation analysis: SurveyDesign.subpopulation() preserves full design structure for correct variance

Phase 7: Completing the Survey Story (v2.8.0-v2.8.1)

7a. CS IPW/DR covariates + survey: DRDID nuisance IF corrections (Sant'Anna & Zhao 2020, Theorem 3.1)
7b. Repeated cross-sections: CallawaySantAnna(panel=False) matching DRDID::reg_did_rc, drdid_rc, std_ipw_did_rc
7c. Survey tutorial: docs/tutorials/16_survey_did.ipynb with full workflow (strata, PSU, FPC, replicates, subpopulation, DEFF)
7d. HonestDiD + survey: survey df and event-study VCV propagated to sensitivity analysis with t-distribution critical values
7e. StaggeredTripleDifference survey support (only implementation in R or Python with design-based DDD variance)

Phase 8: Survey Maturity (v2.8.3-v2.8.4)

8a. SDR replicate method for ACS PUMS (80 columns)
8b. FPC in ImputationDiD and TwoStageDiD
8c. Silent operation warnings (8 operations now emit UserWarning)
8d. Lonely PSU "adjust" in bootstrap (Rust & Rao 1996)
8e. CV on estimates, trim_weights(), survey-aware ImputationDiD pretrends
8f. Compatibility matrix in choosing_estimator.rst

Phase 9: Real-Data Validation (v2.9.0)

15 cross-validation tests against R's survey package using real federal survey datasets:

Dataset	Design	Key result
API (R `survey`)	Strata + FPC	ATT, SE, df, CI match R (7 variants incl. subpopulation, Fay's BRR)
NHANES (CDC/NCHS)	Strata + PSU (nest=TRUE)	ACA DiD matches R for strata+PSU, covariates, subpopulation
RECS 2020 (U.S. EIA)	60 JK1 replicate weights	Coefficients, SEs, df, CI match R

Files: benchmarks/R/benchmark_realdata_*.R, tests/test_survey_real_data.py, benchmarks/data/real/*_realdata_golden.json

Documentation Remaining (Phase 8g)

Multi-stage design: not yet documented. Single-stage (strata + PSU) is sufficient per Lumley (2004) Section 2.2.
Post-stratification / calibration: not yet documented. SurveyDesign expects pre-calibrated weights. samplics is the most complete Python option (post-stratification, raking, GREG) but is in read-only mode — active development has moved to svy, which is not yet publicly released. weightipy is actively maintained for raking. Weight calibration is out of scope for diff-diff today, though building this capability is a future possibility.

Phase 10: Survey Completeness (v2.9.0–v3.0)

10a. Survey theory document (survey-theory.md) — formal justification for design-based variance with modern DiD influence functions
10b. Research-grade survey DGP — 9 parameters on generate_survey_did_data() (8 research-grade + conditional_pt)
10c. R validation expansion — 8 of 16 estimators cross-validated against R's survey::svyglm()
10d. Tutorial rewrite — flat-weight vs design-based comparison with known ground truth
10f. WooldridgeDiD survey support — OLS, logit, Poisson paths with pweight + strata/PSU/FPC + TSL variance

v3.0.1: Survey Aggregation Helper

aggregate_survey() (in diff_diff.prep) bridges individual-level survey microdata (BRFSS, ACS, CPS, NHANES) to geographic-period panels for second-stage DiD estimation. Computes design-based cell means using domain estimation (Lumley 2004 S3.4), with SRS fallback for small cells. Returns a panel DataFrame plus a pre-configured SurveyDesign for the second-stage fit. Default second_stage_weights="pweight" (population weights) is compatible with all survey-capable estimators; opt-in "aweight" (precision weights) provides efficiency-weighted estimates for estimators that accept it. Supports both TSL and replicate-weight variance.

See docs/api/prep.rst for the API reference and docs/methodology/REGISTRY.md for the methodology entry.

Phase 10: Academic Grounding (History)

The Phase 10 items established the theoretical and empirical foundation for survey-design variance estimation on modern DiD influence functions. All items below are shipped; this section documents what was done and why.

10a. Theory Document ✅

docs/methodology/survey-theory.md lays out the formal argument for design-based variance estimation with modern DiD influence functions:

Modern heterogeneity-robust DiD estimators (CS, SA, BJS) are smooth functionals of the weighted empirical distribution
Survey-weighted empirical distribution is design-consistent for the finite-population quantity (Hájek/design-weighted estimator)
The influence function is a property of the functional, not the sampling design — IFs remain valid under survey weighting
TSL (stratified cluster sandwich) and replicate-weight methods are valid variance estimators for smooth functionals of survey-weighted estimating equations (Binder 1983, Rao & Wu 1988, Shao 1996)

This is the short-term deliverable that can be linked from docs and README immediately.

Key references:

Binder, D.A. (1983). "On the Variances of Asymptotically Normal Estimators from Complex Surveys." International Statistical Review 51.
Rao, J.N.K. & Wu, C.F.J. (1988). "Resampling Inference with Complex Survey Data." JASA 83(401).
Shao, J. (1996). "Resampling Methods in Sample Surveys." Statistics 27.

10b. Survey Simulation DGP ✅

Enhanced generate_survey_did_data() with 8 research-grade parameters: icc, weight_cv, informative_sampling, heterogeneous_te_by_strata, te_covariate_interaction, covariate_effects, strata_sizes, and return_true_population_att. All backward-compatible. Supports panel and repeated cross-section modes.

Resolved: conditional_pt parameter added. When nonzero, shifts treated units' x1 mean by +1 SD and adds conditional_pt * x1_i * (t/T) to the outcome, creating X-dependent time trends. Unconditional PT fails; conditional PT holds after covariate adjustment. DR/IPW estimators recover truth.

10c. Expand R Validation Coverage ✅

8 of 16 estimators now cross-validated against R's survey::svyglm(): DifferenceInDifferences, TWFE, CallawaySantAnna, SyntheticDiD, ImputationDiD, StackedDiD, SunAbraham, TripleDifference.

10d. Tutorial: Show the Pain ✅

Survey tutorial rewritten with side-by-side flat-weight vs design-based comparison using the research-grade DGP from 10b, showing known ground truth, coverage simulation, and false pre-trend detection rates.

10f. WooldridgeDiD Survey Support ✅

WooldridgeDiD (ETWFE) now supports survey_design for all three methods (OLS, logit, Poisson) with pweight only (fweight/aweight rejected). OLS uses survey-weighted within-transformation + WLS + TSL vcov. Logit/Poisson use survey-weighted IRLS + X_tilde linearization for TSL vcov. Replicate-weight designs raise NotImplementedError; bootstrap + survey is rejected.

10g. Practitioner Guidance ✅

Subsumed by the practitioner decision tree (docs/practitioner_decision_tree.rst) and the practitioner getting-started guide (docs/practitioner_getting_started.rst). The Brand Awareness Survey DiD tutorial (docs/tutorials/17_brand_awareness_survey.ipynb) demonstrates the full workflow end-to-end; DEFF diagnostics provide the empirical signal for whether survey design matters on a given dataset.

Current Limitations

All items below raise an error when attempted, with a message describing the limitation and suggested alternative.

Estimator	Limitation	Alternative
SyntheticDiD	Replicate weights	Use strata/PSU/FPC design with Rao-Wu rescaled bootstrap
TROP	Replicate weights	Use strata/PSU/FPC design with Rao-Wu rescaled bootstrap
BaconDecomposition	Replicate weights	Diagnostic only, no inference
SyntheticDiD	`variance_method='placebo'` + strata/PSU/FPC	Use `variance_method='bootstrap'`
ImputationDiD	`pretrends=True` + replicate weights	Use analytical survey design instead
ImputationDiD	`pretrend_test()` + replicate weights	Use analytical survey design instead
DiD, TWFE	`inference='wild_bootstrap'` + `survey_design`	Use analytical survey inference (default)
EfficientDiD	`cluster` + `survey_design`	Use `survey_design` with PSU/strata
All bootstrap estimators	Bootstrap + replicate weights	These are alternative variance methods; pick one

Warning/fallback (no error): MultiPeriodDiD with wild_bootstrap + survey_design warns and falls back to analytical inference.

Conservative approach (no error): CallawaySantAnna reg+covariates uses conservative plug-in IF rather than efficient DRDID nuisance IF correction (see REGISTRY.md).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Survey Data Support: History and Current State

What's Shipped

Phases 1-2: Core Infrastructure

Phase 3: OLS-Based Standalone Estimators

Phase 4: Complex Estimators + Weighted Logit

Phase 5: SyntheticDiD + TROP

Phase 6: Advanced Features (v2.7.6)

Phase 7: Completing the Survey Story (v2.8.0-v2.8.1)

Phase 8: Survey Maturity (v2.8.3-v2.8.4)

Phase 9: Real-Data Validation (v2.9.0)

Documentation Remaining (Phase 8g)

Phase 10: Survey Completeness (v2.9.0–v3.0)

v3.0.1: Survey Aggregation Helper

Phase 10: Academic Grounding (History)

10a. Theory Document ✅

10b. Survey Simulation DGP ✅

10c. Expand R Validation Coverage ✅

10d. Tutorial: Show the Pain ✅

10f. WooldridgeDiD Survey Support ✅

10g. Practitioner Guidance ✅

Current Limitations

FilesExpand file tree

survey-roadmap.md

Latest commit

History

survey-roadmap.md

File metadata and controls

Survey Data Support: History and Current State

What's Shipped

Phases 1-2: Core Infrastructure

Phase 3: OLS-Based Standalone Estimators

Phase 4: Complex Estimators + Weighted Logit

Phase 5: SyntheticDiD + TROP

Phase 6: Advanced Features (v2.7.6)

Phase 7: Completing the Survey Story (v2.8.0-v2.8.1)

Phase 8: Survey Maturity (v2.8.3-v2.8.4)

Phase 9: Real-Data Validation (v2.9.0)

Documentation Remaining (Phase 8g)

Phase 10: Survey Completeness (v2.9.0–v3.0)

v3.0.1: Survey Aggregation Helper

Phase 10: Academic Grounding (History)

10a. Theory Document ✅

10b. Survey Simulation DGP ✅

10c. Expand R Validation Coverage ✅

10d. Tutorial: Show the Pain ✅

10f. WooldridgeDiD Survey Support ✅

10g. Practitioner Guidance ✅

Current Limitations