Add natural gas price regression pipeline#2
Conversation
wesleyjcole
left a comment
There was a problem hiding this comment.
Some comments on the PR text:
- Would you write out OLS
- You note fetching the scenario-specific prices from the API in the alpha regression step, but don't you have to do that first because they are needed for the beta regression step?
- You list "ReEDS natural gas price model documentation (internal)" as a relevant source -- I don't know what this is pointing to, and I thought our documentation for the NG supply curves was in the appendix of the model documentation.
- The template includes an AI use disclosure statement that I didn't see here--that would be good to add.
The Python code seems a little verbose--or at least it seems like what is happening shouldn't need so many lines and functions throughout the scripts. Given that you have been using AI for these kinds of tasks, you mind ask it if can accomplish the same actions with less code and see what happens. I'd be interested in some consolidation if that's easily feasible.
| - `ng_AEO_historical.csv` — Historical NG prices | ||
| - `ng_demand_AEO_historical.csv` — Historical electric sector NG demand | ||
| - `ng_tot_demand_AEO_historical.csv` — Historical total sector NG demand |
There was a problem hiding this comment.
Where are these files from? Are they just the files from the ReEDS respository?
| # Old scripts (replaced by visualization.py) | ||
| results_visualization.py | ||
| results_validation.py | ||
| beta_raw_data_visualization.py |
There was a problem hiding this comment.
Why should these be here if they are replaced?
| EXCLUDE_SCENARIOS = {"highogs", "lowogs", "high oil and gas supply", | ||
| "low oil and gas supply"} |
There was a problem hiding this comment.
Aren't these the same? Isn't highogs = high oil and gas supply? I thought you also excluded the high and low gas price scenarios?
| beta_start_year = _parse_optional_int( | ||
| beta_scen_cfg.get("start_year"), "scenarios.beta_regression.start_year" | ||
| ) |
There was a problem hiding this comment.
Do you need a function for this? Can't you just convert it to an int?
| # ============================================================================ | ||
| # Helpers | ||
| # ============================================================================ |
There was a problem hiding this comment.
Would you add some comments to the functions throughout to better explain what they are?
| X = np.column_stack(x_cols) | ||
| y = merged["dp"].to_numpy() | ||
|
|
||
| # OLS |
| CENDIV_OUTPUT = { | ||
| "NewEngland": "New_England", | ||
| "MiddleAtlantic": "Mid_Atlantic", | ||
| "EastNorthCentral": "East_North_Central", | ||
| "WestNorthCentral": "West_North_Central", | ||
| "SouthAtlantic": "South_Atlantic", | ||
| "EastSouthCentral": "East_South_Central", | ||
| "WestSouthCentral": "West_South_Central", | ||
| "Mountain": "Mountain", | ||
| "Pacific": "Pacific", | ||
| } |
There was a problem hiding this comment.
This kind of thing is also in the beta regression script. Can it be shared in order to not be duplicated?
| # ============================================================================ | ||
| # Utility Functions | ||
| # ============================================================================ |
There was a problem hiding this comment.
Would it make sense to move all function to their own script, and then have the alpha and beta regression scripts import from there? Only recommending if that would help remove some duplication.
| "exclude_aliases": [ | ||
| "highogs", | ||
| "highprice", | ||
| "lowprice", | ||
| "lowogs" |
There was a problem hiding this comment.
This seems to be in conflict with the exclusion in the python script.
| 2. Run **beta regression first**. | ||
|
|
||
| ```bash | ||
| python aeo_beta_regression.py --config aeo_pipeline_config.json |
There was a problem hiding this comment.
Do you need these given that run_ng_pipeline.py does all of it for you?
There was a problem hiding this comment.
Pull request overview
Adds a new automated, config-driven natural gas price regression preprocessing pipeline under aeo_updates/natural_gas_price_regression/ to generate ReEDS-ready inputs (betas, alphas, prices, and demands) from EIA AEO data, with built-in diagnostics and validation.
Changes:
- Introduces end-to-end beta + alpha regression scripts, a one-command runner, and a unified visualization/validation tool.
- Adds config + documentation and organizes historical input CSVs and output directories for repeatable runs.
- Updates
aeo_updates/README.mdto point NG price/demand updates to the new pipeline.
Reviewed changes
Copilot reviewed 16 out of 17 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| aeo_updates/natural_gas_price_regression/aeo_beta_regression.py | Stage 1: joint fixed-effects beta estimation + beta-step alpha export |
| aeo_updates/natural_gas_price_regression/sync_beta_to_alpha_inputs.py | Copies beta outputs into alpha step inputs |
| aeo_updates/natural_gas_price_regression/aeo_alpha_regression.py | Stage 2: scenario-specific alpha computation + historical backfill |
| aeo_updates/natural_gas_price_regression/visualization.py | Unified plotting + parity/validation CSVs |
| aeo_updates/natural_gas_price_regression/run_ng_pipeline.py | Cross-platform runner to execute all stages in order |
| aeo_updates/natural_gas_price_regression/aeo_pipeline_config.json | Single config file for scenarios/paths/API/options |
| aeo_updates/natural_gas_price_regression/README.md | Run instructions and configuration notes |
| aeo_updates/natural_gas_price_regression/.gitignore | Ignores generated outputs/plots and intermediate copied inputs |
| aeo_updates/natural_gas_price_regression/inputs for alpha regression/README.md | Documents historical/manual vs auto-generated inputs |
| aeo_updates/natural_gas_price_regression/inputs for alpha regression/st_cendiv.csv | State → census division mapping |
| aeo_updates/natural_gas_price_regression/inputs for alpha regression/ng_AEO_historical.csv | Historical NG prices used for backfill |
| aeo_updates/natural_gas_price_regression/inputs for alpha regression/ng_demand_AEO_historical.csv | Historical electric-sector NG demand backfill |
| aeo_updates/natural_gas_price_regression/inputs for alpha regression/ng_tot_demand_AEO_historical.csv | Historical total NG demand backfill |
| aeo_updates/natural_gas_price_regression/outputs of beta regression/README.md | Documents intermediate beta-step outputs |
| aeo_updates/natural_gas_price_regression/outputs of alpha regression/README.md | Documents final outputs and how to copy into ReEDS |
| aeo_updates/README.md | Replaces manual NG spreadsheet workflow instructions with pointer to the new pipeline |
| aeo_updates/NG Prices Preprocessing for AEO Inputs.xlsx | Included in PR contents (legacy manual workflow artifact) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "base_url": "https://api.eia.gov/v2", | ||
| "key_env_var": "EIA_API_KEY", | ||
| "key_fallback": "", | ||
| "verify_ssl": false, |
There was a problem hiding this comment.
api.verify_ssl is set to false in the default config. Disabling TLS verification is a security risk (susceptible to MITM) and can also mask certificate/endpoint issues. Safer default is true, with users explicitly opting out only for controlled environments (and ideally with a prominent warning when disabled).
| "verify_ssl": false, | |
| "verify_ssl": true, |
| Price(r,t,s) = Alpha(r,t,s) + Beta_regional(r) × Demand_regional(r,t,s) | ||
| + Beta_national × Demand_national(t,s) |
There was a problem hiding this comment.
The module docstring contains mojibake characters (e.g., "×" instead of the intended multiplication sign "×" or "*"). This makes the documentation misleading and suggests the file may have been saved with the wrong encoding (note also the BOM marker at the start of the file). Please replace these sequences with the intended characters and ensure the file is UTF-8 encoded.
| print("\nNext commands:") | ||
| print(" python sync_beta_to_alpha_inputs.py --config aeo_pipeline_config.json") | ||
| print(" python aeo_alpha_regression.py --config aeo_pipeline_config.json") | ||
| print(" python results_visualization.py --config aeo_pipeline_config.json") | ||
| print("=" * 60) |
There was a problem hiding this comment.
This script still instructs users to run results_visualization.py, but that script is marked as replaced by visualization.py elsewhere in this PR. Update the printed next-step command (and the docstring output list) to reference visualization.py so the guidance matches the new pipeline.
| for s in scenarios | ||
| ] | ||
| fig.supxlabel("Natural Gas Demand (quads)", fontsize=11) | ||
| fig.supylabel("Natural Gas Price (2024 $/MMBtu)", fontsize=11) |
There was a problem hiding this comment.
The raw-scatter plot y-axis label is hard-coded as "Natural Gas Price (2024
| fig.supylabel("Natural Gas Price (2024 $/MMBtu)", fontsize=11) | |
| fig.supylabel("Natural Gas Price (AEO $/MMBtu)", fontsize=11) |
| # Beta regression outputs (all generated, reproducible) | ||
| outputs of beta regression/*.csv | ||
| outputs of beta regression/raw_aeo_data/ | ||
| outputs of beta regression/bata raw data visualization/ |
There was a problem hiding this comment.
Typo in ignored directory name: "bata raw data visualization" looks unintended and likely won't match the actual output directory. Fixing this prevents generated artifacts from being accidentally committed.
| outputs of beta regression/bata raw data visualization/ | |
| outputs of beta regression/beta raw data visualization/ |
Summary
Adds a new automated preprocessing pipeline under
aeo_updates/natural_gas_price_regression/that converts EIA Annual Energy Outlook (AEO) natural gas price and demand data into ReEDS model inputs. This replaces any previous manual or ad-hoc workflow for updating NG price regression inputs.Technical details
The pipeline implements a two-stage linear decomposition of regional natural gas prices:
Beta regression (
aeo_beta_regression.py): Estimates the structural demand-price elasticities via demeaned fixed-effects OLS across multiple AEO scenarios. Demeaning is performed at the (region, year) level to remove fixed-effect intercepts and allow joint estimation of 9 regional betas and 1 national beta from cross-scenario variation. High/Low Oil & Gas scenarios are excluded from this step to avoid distorting the structural coefficients.Alpha regression (
aeo_alpha_regression.py): Fetches scenario-specific prices from the EIA API and computes the residual alpha (base price intercept) for each (region, year, scenario) after accounting for beta terms. Historical years (pre-projection) are backfilled from local CSVs identically across all scenarios. The first modeled year receives special treatment: beta contributions are zeroed and alpha absorbs the full price level, consistent with how ReEDS initializes the NG price model.All monetary values are deflated to 2004 dollars using a configurable deflator.
Implementation notes
New files:
aeo_beta_regression.pysync_beta_to_alpha_inputs.pyaeo_alpha_regression.pyvisualization.pyaeo_pipeline_config.jsonrun_ng_pipeline.pyREADME.md.gitignoreinputs for alpha regression/README.mdoutputs of alpha regression/README.mdoutputs of beta regression/README.mdPipeline data flow:
Final outputs (written to
outputs of alpha regression/):cd_beta0.csv,national_beta.csv)Config-driven flexibility:
aeo_year,start_year,end_yearcontrol vintage and projection horizonscenarios.beta_regression.include/exclude_aliasescontrol which scenarios feed beta estimationscenarios.alpha_regression.fetchandoutputscontrol which scenarios are output and their file suffixesHistorical data maintenance:
inputs for alpha regression/API key: Read from the
EIA_API_KEYenvironment variable. Not stored in config or source. See README for setup instructions.Additional changes
aeo_updates/natural_gas_price_regression/as a new self-contained subdirectory; no changes to existing scripts elsewhere in the repo.Issues resolved
Update gas price elasticities #30 of ReEDS repo.
Relevant sources or documentation
Additional details
To run after cloning:
Slides
NG price elasticity update -2025.pptx