This document is the technical reference for survey-design support in diff-diff. It records the build history (Phases 1-10) as shipped and documents current limitations. Forward-looking roadmap items live in ROADMAP.md; this file is the historical and technical companion.
SurveyDesignclass with weights, strata, PSU, FPC, weight_type, nest, lonely_psu- Taylor Series Linearization (TSL) variance with strata + PSU + FPC
- Weighted OLS, sandwich estimator, demeaning, survey degrees of freedom
SurveyMetadataon results (effective n, DEFF, weight_range)- Base estimators: DifferenceInDifferences, TwoWayFixedEffects, MultiPeriodDiD
| Estimator | Survey Support | Notes |
|---|---|---|
| StackedDiD | pweight only | Q-weights compose multiplicatively; fweight/aweight rejected |
| SunAbraham | Full | Bootstrap via Rao-Wu rescaled |
| BaconDecomposition | Diagnostic | Weighted descriptives only, no inference |
| TripleDifference | Full | Regression, IPW, and DR methods with TSL on IFs |
| ContinuousDiD | Full | Weighted B-spline OLS + TSL; bootstrap via multiplier at PSU |
| EfficientDiD | Full | No-cov and DR covariate paths both survey-weighted; bootstrap via multiplier at PSU |
| Estimator | Survey Support | Notes |
|---|---|---|
| ImputationDiD | Full | Weighted iterative FE + conservative variance; bootstrap via multiplier at PSU |
| TwoStageDiD | Full | Weighted FE + GMM sandwich; bootstrap via multiplier at PSU |
| CallawaySantAnna | Full | Strata/PSU/FPC/replicate weights; IPW/DR covariates (Phase 7a); replicate IF variance |
Weighted solve_logit() in linalg.py — survey weights enter IRLS as
w_survey * mu * (1 - mu).
| Estimator | Survey Support | Notes |
|---|---|---|
| SyntheticDiD | pweight | Treated means survey-weighted; omega composed with control weights post-optimization |
| TROP | pweight | Population-weighted ATT aggregation; model fitting unchanged |
- Survey-aware bootstrap for all 8 bootstrap-using estimators: multiplier at PSU (CS, Imputation, TwoStage, Continuous, Efficient) and Rao-Wu rescaled (SA, SyntheticDiD, TROP)
- Replicate weight variance: BRR, Fay's BRR, JK1, JKn, SDR. 12 of 16 estimators supported (not SyntheticDiD, TROP, BaconDecomposition, or WooldridgeDiD)
- DEFF diagnostics: per-coefficient design effects vs SRS baseline
- Subpopulation analysis:
SurveyDesign.subpopulation()preserves full design structure for correct variance
- 7a. CS IPW/DR covariates + survey: DRDID nuisance IF corrections (Sant'Anna & Zhao 2020, Theorem 3.1)
- 7b. Repeated cross-sections:
CallawaySantAnna(panel=False)matchingDRDID::reg_did_rc,drdid_rc,std_ipw_did_rc - 7c. Survey tutorial:
docs/tutorials/16_survey_did.ipynbwith full workflow (strata, PSU, FPC, replicates, subpopulation, DEFF) - 7d. HonestDiD + survey: survey df and event-study VCV propagated to sensitivity analysis with t-distribution critical values
- 7e. StaggeredTripleDifference survey support (only implementation in R or Python with design-based DDD variance)
- 8a. SDR replicate method for ACS PUMS (80 columns)
- 8b. FPC in ImputationDiD and TwoStageDiD
- 8c. Silent operation warnings (8 operations now emit
UserWarning) - 8d. Lonely PSU "adjust" in bootstrap (Rust & Rao 1996)
- 8e. CV on estimates,
trim_weights(), survey-aware ImputationDiD pretrends - 8f. Compatibility matrix in
choosing_estimator.rst
15 cross-validation tests against R's survey package using real federal
survey datasets:
| Dataset | Design | Key result |
|---|---|---|
API (R survey) |
Strata + FPC | ATT, SE, df, CI match R (7 variants incl. subpopulation, Fay's BRR) |
| NHANES (CDC/NCHS) | Strata + PSU (nest=TRUE) | ACA DiD matches R for strata+PSU, covariates, subpopulation |
| RECS 2020 (U.S. EIA) | 60 JK1 replicate weights | Coefficients, SEs, df, CI match R |
Files: benchmarks/R/benchmark_realdata_*.R, tests/test_survey_real_data.py,
benchmarks/data/real/*_realdata_golden.json
- Multi-stage design: not yet documented. Single-stage (strata + PSU) is sufficient per Lumley (2004) Section 2.2.
- Post-stratification / calibration: not yet documented.
SurveyDesignexpects pre-calibrated weights.samplicsis the most complete Python option (post-stratification, raking, GREG) but is in read-only mode — active development has moved tosvy, which is not yet publicly released.weightipyis actively maintained for raking. Weight calibration is out of scope for diff-diff today, though building this capability is a future possibility.
- 10a. Survey theory document (
survey-theory.md) — formal justification for design-based variance with modern DiD influence functions - 10b. Research-grade survey DGP — 9 parameters on
generate_survey_did_data()(8 research-grade +conditional_pt) - 10c. R validation expansion — 8 of 16 estimators cross-validated against R's
survey::svyglm() - 10d. Tutorial rewrite — flat-weight vs design-based comparison with known ground truth
- 10f. WooldridgeDiD survey support — OLS, logit, Poisson paths with
pweight+ strata/PSU/FPC + TSL variance
aggregate_survey() (in diff_diff.prep) bridges individual-level survey
microdata (BRFSS, ACS, CPS, NHANES) to geographic-period panels for
second-stage DiD estimation. Computes design-based cell means using domain
estimation (Lumley 2004 S3.4), with SRS fallback for small cells. Returns a
panel DataFrame plus a pre-configured SurveyDesign for the second-stage
fit. Default second_stage_weights="pweight" (population weights) is
compatible with all survey-capable estimators; opt-in "aweight" (precision
weights) provides efficiency-weighted estimates for estimators that accept it.
Supports both TSL and replicate-weight variance.
See docs/api/prep.rst for the API reference and docs/methodology/REGISTRY.md
for the methodology entry.
The Phase 10 items established the theoretical and empirical foundation for survey-design variance estimation on modern DiD influence functions. All items below are shipped; this section documents what was done and why.
docs/methodology/survey-theory.md lays out the formal argument for
design-based variance estimation with modern DiD influence functions:
- Modern heterogeneity-robust DiD estimators (CS, SA, BJS) are smooth functionals of the weighted empirical distribution
- Survey-weighted empirical distribution is design-consistent for the finite-population quantity (Hájek/design-weighted estimator)
- The influence function is a property of the functional, not the sampling design — IFs remain valid under survey weighting
- TSL (stratified cluster sandwich) and replicate-weight methods are valid variance estimators for smooth functionals of survey-weighted estimating equations (Binder 1983, Rao & Wu 1988, Shao 1996)
This is the short-term deliverable that can be linked from docs and README immediately.
Key references:
- Binder, D.A. (1983). "On the Variances of Asymptotically Normal Estimators from Complex Surveys." International Statistical Review 51.
- Rao, J.N.K. & Wu, C.F.J. (1988). "Resampling Inference with Complex Survey Data." JASA 83(401).
- Shao, J. (1996). "Resampling Methods in Sample Surveys." Statistics 27.
Enhanced generate_survey_did_data() with 8 research-grade parameters:
icc, weight_cv, informative_sampling, heterogeneous_te_by_strata,
te_covariate_interaction, covariate_effects, strata_sizes, and
return_true_population_att. All backward-compatible. Supports panel
and repeated cross-section modes.
Resolved: conditional_pt parameter added. When nonzero, shifts treated
units' x1 mean by +1 SD and adds conditional_pt * x1_i * (t/T) to the
outcome, creating X-dependent time trends. Unconditional PT fails; conditional
PT holds after covariate adjustment. DR/IPW estimators recover truth.
8 of 16 estimators now cross-validated against R's survey::svyglm():
DifferenceInDifferences, TWFE, CallawaySantAnna, SyntheticDiD,
ImputationDiD, StackedDiD, SunAbraham, TripleDifference.
Survey tutorial rewritten with side-by-side flat-weight vs design-based comparison using the research-grade DGP from 10b, showing known ground truth, coverage simulation, and false pre-trend detection rates.
WooldridgeDiD (ETWFE) now supports survey_design for all three methods
(OLS, logit, Poisson) with pweight only (fweight/aweight rejected).
OLS uses survey-weighted within-transformation + WLS + TSL vcov.
Logit/Poisson use survey-weighted IRLS + X_tilde linearization for TSL
vcov. Replicate-weight designs raise NotImplementedError; bootstrap +
survey is rejected.
Subsumed by the practitioner decision tree
(docs/practitioner_decision_tree.rst) and the practitioner
getting-started guide (docs/practitioner_getting_started.rst).
The Brand Awareness Survey DiD tutorial
(docs/tutorials/17_brand_awareness_survey.ipynb) demonstrates the
full workflow end-to-end; DEFF diagnostics provide the empirical signal
for whether survey design matters on a given dataset.
All items below raise an error when attempted, with a message describing the limitation and suggested alternative.
| Estimator | Limitation | Alternative |
|---|---|---|
| SyntheticDiD | Replicate weights | Use strata/PSU/FPC design with Rao-Wu rescaled bootstrap |
| TROP | Replicate weights | Use strata/PSU/FPC design with Rao-Wu rescaled bootstrap |
| BaconDecomposition | Replicate weights | Diagnostic only, no inference |
| SyntheticDiD | variance_method='placebo' + strata/PSU/FPC |
Use variance_method='bootstrap' |
| ImputationDiD | pretrends=True + replicate weights |
Use analytical survey design instead |
| ImputationDiD | pretrend_test() + replicate weights |
Use analytical survey design instead |
| DiD, TWFE | inference='wild_bootstrap' + survey_design |
Use analytical survey inference (default) |
| EfficientDiD | cluster + survey_design |
Use survey_design with PSU/strata |
| All bootstrap estimators | Bootstrap + replicate weights | These are alternative variance methods; pick one |
Warning/fallback (no error): MultiPeriodDiD with wild_bootstrap +
survey_design warns and falls back to analytical inference.
Conservative approach (no error): CallawaySantAnna reg+covariates
uses conservative plug-in IF rather than efficient DRDID nuisance IF
correction (see REGISTRY.md).