Skip to content

Add natural gas price regression pipeline#2

Open
Yunzhi-Chen wants to merge 14 commits intomainfrom
yc/natural_gas_update
Open

Add natural gas price regression pipeline#2
Yunzhi-Chen wants to merge 14 commits intomainfrom
yc/natural_gas_update

Conversation

@Yunzhi-Chen
Copy link
Copy Markdown

@Yunzhi-Chen Yunzhi-Chen commented Apr 16, 2026

Summary

Adds a new automated preprocessing pipeline under aeo_updates/natural_gas_price_regression/ that converts EIA Annual Energy Outlook (AEO) natural gas price and demand data into ReEDS model inputs. This replaces any previous manual or ad-hoc workflow for updating NG price regression inputs.


Technical details

The pipeline implements a two-stage linear decomposition of regional natural gas prices:

$$\text{Price}(r,t,s) = \alpha + \beta_{reg} \cdot Q_{reg} + \beta_{nat} \cdot Q_{nat}$$

  • Beta regression (aeo_beta_regression.py): Estimates the structural demand-price elasticities via demeaned fixed-effects OLS across multiple AEO scenarios. Demeaning is performed at the (region, year) level to remove fixed-effect intercepts and allow joint estimation of 9 regional betas and 1 national beta from cross-scenario variation. High/Low Oil & Gas scenarios are excluded from this step to avoid distorting the structural coefficients.

  • Alpha regression (aeo_alpha_regression.py): Fetches scenario-specific prices from the EIA API and computes the residual alpha (base price intercept) for each (region, year, scenario) after accounting for beta terms. Historical years (pre-projection) are backfilled from local CSVs identically across all scenarios. The first modeled year receives special treatment: beta contributions are zeroed and alpha absorbs the full price level, consistent with how ReEDS initializes the NG price model.

All monetary values are deflated to 2004 dollars using a configurable deflator.


Implementation notes

New files:

File Role
aeo_beta_regression.py Stage 1 — joint fixed-effects beta estimation
sync_beta_to_alpha_inputs.py Stage 2 — copies beta outputs to alpha input directory
aeo_alpha_regression.py Stage 3 — scenario-specific alpha computation
visualization.py Stage 4 — diagnostic plots and parity validation
aeo_pipeline_config.json Single config file controlling all four stages
run_ng_pipeline.py Cross-platform pipeline runner (exits on first failure)
README.md Run instructions, prerequisites, and config reference
.gitignore Excludes generated outputs, raw API dumps, and plots
inputs for alpha regression/README.md Documents historical CSVs and auto-generated beta inputs
outputs of alpha regression/README.md Final output instructions for copying to ReEDS
outputs of beta regression/README.md Documents intermediate beta regression outputs

Pipeline data flow:

Final outputs (written to outputs of alpha regression/):

  • Per-scenario NG prices, electric sector demand, total demand, and alpha intercepts (3 scenarios × 4 file types by default: reference, HOG, LOG)
  • Shared beta coefficient tables (cd_beta0.csv, national_beta.csv)

Config-driven flexibility:

  • aeo_year, start_year, end_year control vintage and projection horizon
  • scenarios.beta_regression.include / exclude_aliases control which scenarios feed beta estimation
  • scenarios.alpha_regression.fetch and outputs control which scenarios are output and their file suffixes
  • Designed to support additional output scenarios with no code changes

Historical data maintenance:

  • Pre-projection years (2010–2023) are provided via local CSV files in inputs for alpha regression/
  • Each pipeline run automatically appends the current AEO's calibration year (e.g., AEO 2025 appends 2024) to the history CSVs, so the next AEO version has seamless year coverage with no manual update needed

API key: Read from the EIA_API_KEY environment variable. Not stored in config or source. See README for setup instructions.


Additional changes

  • Adds aeo_updates/natural_gas_price_regression/ as a new self-contained subdirectory; no changes to existing scripts elsewhere in the repo.

Issues resolved

Update gas price elasticities #30 of ReEDS repo.


Relevant sources or documentation


Additional details

To run after cloning:

export EIA_API_KEY=your_key_here   # or `set EIA_API_KEY=your_key_here` on Windows
cd "aeo_updates/natural_gas_price_regression"
python run_ng_pipeline.py

Slides

NG price elasticity update -2025.pptx

@Yunzhi-Chen Yunzhi-Chen self-assigned this Apr 16, 2026
@Yunzhi-Chen Yunzhi-Chen requested a review from wesleyjcole April 16, 2026 21:44
Copy link
Copy Markdown

@wesleyjcole wesleyjcole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments on the PR text:

  • Would you write out OLS
  • You note fetching the scenario-specific prices from the API in the alpha regression step, but don't you have to do that first because they are needed for the beta regression step?
  • You list "ReEDS natural gas price model documentation (internal)" as a relevant source -- I don't know what this is pointing to, and I thought our documentation for the NG supply curves was in the appendix of the model documentation.
  • The template includes an AI use disclosure statement that I didn't see here--that would be good to add.

The Python code seems a little verbose--or at least it seems like what is happening shouldn't need so many lines and functions throughout the scripts. Given that you have been using AI for these kinds of tasks, you mind ask it if can accomplish the same actions with less code and see what happens. I'd be interested in some consolidation if that's easily feasible.

Comment on lines +9 to +11
- `ng_AEO_historical.csv` — Historical NG prices
- `ng_demand_AEO_historical.csv` — Historical electric sector NG demand
- `ng_tot_demand_AEO_historical.csv` — Historical total sector NG demand
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are these files from? Are they just the files from the ReEDS respository?

Comment on lines +32 to +35
# Old scripts (replaced by visualization.py)
results_visualization.py
results_validation.py
beta_raw_data_visualization.py
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should these be here if they are replaced?

Comment on lines +89 to +90
EXCLUDE_SCENARIOS = {"highogs", "lowogs", "high oil and gas supply",
"low oil and gas supply"}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't these the same? Isn't highogs = high oil and gas supply? I thought you also excluded the high and low gas price scenarios?

Comment on lines +582 to +584
beta_start_year = _parse_optional_int(
beta_scen_cfg.get("start_year"), "scenarios.beta_regression.start_year"
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need a function for this? Can't you just convert it to an int?

Comment on lines +93 to +95
# ============================================================================
# Helpers
# ============================================================================
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you add some comments to the functions throughout to better explain what they are?

X = np.column_stack(x_cols)
y = merged["dp"].to_numpy()

# OLS
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd spell out OLS throughout.

Comment on lines +98 to +108
CENDIV_OUTPUT = {
"NewEngland": "New_England",
"MiddleAtlantic": "Mid_Atlantic",
"EastNorthCentral": "East_North_Central",
"WestNorthCentral": "West_North_Central",
"SouthAtlantic": "South_Atlantic",
"EastSouthCentral": "East_South_Central",
"WestSouthCentral": "West_South_Central",
"Mountain": "Mountain",
"Pacific": "Pacific",
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of thing is also in the beta regression script. Can it be shared in order to not be duplicated?

Comment on lines +127 to +129
# ============================================================================
# Utility Functions
# ============================================================================
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to move all function to their own script, and then have the alpha and beta regression scripts import from there? Only recommending if that would help remove some duplication.

Comment on lines +22 to +26
"exclude_aliases": [
"highogs",
"highprice",
"lowprice",
"lowogs"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be in conflict with the exclusion in the python script.

2. Run **beta regression first**.

```bash
python aeo_beta_regression.py --config aeo_pipeline_config.json
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need these given that run_ng_pipeline.py does all of it for you?

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new automated, config-driven natural gas price regression preprocessing pipeline under aeo_updates/natural_gas_price_regression/ to generate ReEDS-ready inputs (betas, alphas, prices, and demands) from EIA AEO data, with built-in diagnostics and validation.

Changes:

  • Introduces end-to-end beta + alpha regression scripts, a one-command runner, and a unified visualization/validation tool.
  • Adds config + documentation and organizes historical input CSVs and output directories for repeatable runs.
  • Updates aeo_updates/README.md to point NG price/demand updates to the new pipeline.

Reviewed changes

Copilot reviewed 16 out of 17 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
aeo_updates/natural_gas_price_regression/aeo_beta_regression.py Stage 1: joint fixed-effects beta estimation + beta-step alpha export
aeo_updates/natural_gas_price_regression/sync_beta_to_alpha_inputs.py Copies beta outputs into alpha step inputs
aeo_updates/natural_gas_price_regression/aeo_alpha_regression.py Stage 2: scenario-specific alpha computation + historical backfill
aeo_updates/natural_gas_price_regression/visualization.py Unified plotting + parity/validation CSVs
aeo_updates/natural_gas_price_regression/run_ng_pipeline.py Cross-platform runner to execute all stages in order
aeo_updates/natural_gas_price_regression/aeo_pipeline_config.json Single config file for scenarios/paths/API/options
aeo_updates/natural_gas_price_regression/README.md Run instructions and configuration notes
aeo_updates/natural_gas_price_regression/.gitignore Ignores generated outputs/plots and intermediate copied inputs
aeo_updates/natural_gas_price_regression/inputs for alpha regression/README.md Documents historical/manual vs auto-generated inputs
aeo_updates/natural_gas_price_regression/inputs for alpha regression/st_cendiv.csv State → census division mapping
aeo_updates/natural_gas_price_regression/inputs for alpha regression/ng_AEO_historical.csv Historical NG prices used for backfill
aeo_updates/natural_gas_price_regression/inputs for alpha regression/ng_demand_AEO_historical.csv Historical electric-sector NG demand backfill
aeo_updates/natural_gas_price_regression/inputs for alpha regression/ng_tot_demand_AEO_historical.csv Historical total NG demand backfill
aeo_updates/natural_gas_price_regression/outputs of beta regression/README.md Documents intermediate beta-step outputs
aeo_updates/natural_gas_price_regression/outputs of alpha regression/README.md Documents final outputs and how to copy into ReEDS
aeo_updates/README.md Replaces manual NG spreadsheet workflow instructions with pointer to the new pipeline
aeo_updates/NG Prices Preprocessing for AEO Inputs.xlsx Included in PR contents (legacy manual workflow artifact)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"base_url": "https://api.eia.gov/v2",
"key_env_var": "EIA_API_KEY",
"key_fallback": "",
"verify_ssl": false,
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

api.verify_ssl is set to false in the default config. Disabling TLS verification is a security risk (susceptible to MITM) and can also mask certificate/endpoint issues. Safer default is true, with users explicitly opting out only for controlled environments (and ideally with a prominent warning when disabled).

Suggested change
"verify_ssl": false,
"verify_ssl": true,

Copilot uses AI. Check for mistakes.
Comment on lines +15 to +16
Price(r,t,s) = Alpha(r,t,s) + Beta_regional(r) × Demand_regional(r,t,s)
+ Beta_national × Demand_national(t,s)
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module docstring contains mojibake characters (e.g., "×" instead of the intended multiplication sign "×" or "*"). This makes the documentation misleading and suggests the file may have been saved with the wrong encoding (note also the BOM marker at the start of the file). Please replace these sequences with the intended characters and ensure the file is UTF-8 encoded.

Copilot uses AI. Check for mistakes.
Comment on lines +692 to +696
print("\nNext commands:")
print(" python sync_beta_to_alpha_inputs.py --config aeo_pipeline_config.json")
print(" python aeo_alpha_regression.py --config aeo_pipeline_config.json")
print(" python results_visualization.py --config aeo_pipeline_config.json")
print("=" * 60)
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script still instructs users to run results_visualization.py, but that script is marked as replaced by visualization.py elsewhere in this PR. Update the printed next-step command (and the docstring output list) to reference visualization.py so the guidance matches the new pipeline.

Copilot uses AI. Check for mistakes.
for s in scenarios
]
fig.supxlabel("Natural Gas Demand (quads)", fontsize=11)
fig.supylabel("Natural Gas Price (2024 $/MMBtu)", fontsize=11)
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The raw-scatter plot y-axis label is hard-coded as "Natural Gas Price (2024 $/MMBtu)", but the AEO dollar year can change by vintage/config. Consider making the dollar-year label configurable (or use a neutral label like "AEO $/MMBtu"), to avoid producing misleading plot annotations when running for other AEO years.

Suggested change
fig.supylabel("Natural Gas Price (2024 $/MMBtu)", fontsize=11)
fig.supylabel("Natural Gas Price (AEO $/MMBtu)", fontsize=11)

Copilot uses AI. Check for mistakes.
# Beta regression outputs (all generated, reproducible)
outputs of beta regression/*.csv
outputs of beta regression/raw_aeo_data/
outputs of beta regression/bata raw data visualization/
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in ignored directory name: "bata raw data visualization" looks unintended and likely won't match the actual output directory. Fixing this prevents generated artifacts from being accidentally committed.

Suggested change
outputs of beta regression/bata raw data visualization/
outputs of beta regression/beta raw data visualization/

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants