Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,8 @@ The data generation codes are contained in [data_gen](./pdebench/data_gen):
- `gen_radial_dam_break.py` to generate the 2D shallow-water data.
- `gen_ns_incomp.py` to generate the 2D incompressible inhomogeneous
Navier-Stokes data.
- `gen_wave.py` to generate 1D/2D wave equation and Klein-Gordon equation data.
See [WAVE_BENCHMARK.md](WAVE_BENCHMARK.md) for details and baseline results.
- `plot.py` to plot the generated data.
- `uploader.py` to upload the generated data to the data repository.
- `.env` is the environment data to store Dataverse URL and API token to upload
Expand Down
109 changes: 109 additions & 0 deletions WAVE_BENCHMARK.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# Wave Equation & Klein-Gordon Benchmark

**Contributed by:** Greg Partin ([@gpartin](https://github.com/gpartin))

New PDE benchmark tasks for PDEBench: the **wave equation** and **Klein-Gordon equation** in 1D and 2D with periodic boundary conditions.

## Equations

**Wave equation:**
$$\frac{\partial^2 u}{\partial t^2} = c^2 \nabla^2 u$$

**Klein-Gordon equation:**
$$\frac{\partial^2 u}{\partial t^2} = c^2 \nabla^2 u - \chi^2 u$$

Both solved on periodic domain $[0, 1]^d$ with random Fourier initial conditions.

## Data Generation

```bash
# 1D wave equation, c=1.0, 1000 samples
python -m pdebench.data_gen.gen_wave sim.c=1.0 sim.chi=0.0

# Vary wave speed
python -m pdebench.data_gen.gen_wave sim.c=0.1
python -m pdebench.data_gen.gen_wave sim.c=0.4
python -m pdebench.data_gen.gen_wave sim.c=2.0

# Klein-Gordon with mass parameter
python -m pdebench.data_gen.gen_wave sim.c=1.0 sim.chi=0.5
python -m pdebench.data_gen.gen_wave sim.c=1.0 sim.chi=1.0
python -m pdebench.data_gen.gen_wave sim.c=1.0 sim.chi=2.0
python -m pdebench.data_gen.gen_wave sim.c=1.0 sim.chi=5.0

# 2D wave equation
python -m pdebench.data_gen.gen_wave sim.c=1.0 sim.ndim=2 sim.xdim=128
```

## Solver Details

- **Method**: Leapfrog (Verlet) finite-difference, second-order in space and time
- **Boundary conditions**: Periodic
- **CFL**: dt = 0.5 * dx / (c * sqrt(ndim))
- **Spatial resolution**: 1024 (1D), 128×128 (2D)
- **Temporal resolution**: 201 snapshots over t ∈ [0, 2]
- **Initial conditions**: Random superposition of 5 Fourier modes, normalized to unit max amplitude

## HDF5 Format

Output follows PDEBench Format 1A:
- `"tensor"`: shape `(n_samples, t, x)` for 1D, `(n_samples, t, x, y)` for 2D
- `"x-coordinate"`: spatial grid
- `"t-coordinate"`: temporal grid

## Baseline Results

### 1D Wave Equation — FNO vs UNet

Trained with autoregressive 41-step rollout, `initial_step=10`, spatial resolution 256 (downsampled 4×), temporal resolution 51 (downsampled 4×), 100 epochs.

| Wave Speed c | FNO nRMSE | UNet nRMSE | Solver nRMSE |
|:---:|:---:|:---:|:---:|
| 0.1 | 0.101 | 0.537 | 1.68 × 10⁻⁴ |
| 0.4 | 0.112 | 0.978 | 6.86 × 10⁻⁴ |
| 1.0 | 0.099 | 0.934 | 1.71 × 10⁻³ |
| 2.0 | 0.095 | 0.632 | 3.33 × 10⁻³ |

- **FNO** achieves ~10% nRMSE, stable across wave speeds (spectral convolutions match wave dynamics)
- **UNet** fails catastrophically (54–98% nRMSE) — spatial convolutions cannot capture global wave propagation
- **Solver** achieves near-machine-precision (O(c² Δt²) discretization error)

### Klein-Gordon Cross-Parameter Generalization

FNO trained on one χ value, tested on all four. This tests whether neural surrogates can extrapolate across the mass parameter.

| Train χ \ Test χ | 0.5 | 1.0 | 2.0 | 5.0 |
|:---:|:---:|:---:|:---:|:---:|
| **0.5** | **0.093** | 0.096 | 0.174 | 0.891 |
| **1.0** | 0.097 | **0.094** | 0.154 | 0.877 |
| **2.0** | 0.170 | 0.150 | **0.095** | 0.789 |
| **5.0** | 0.783 | 0.771 | 0.707 | **0.098** |

**Key findings:**
1. **Small χ changes (≤2×)**: 1.0–1.9× degradation (FNO extrapolates acceptably)
2. **Large χ changes (≥5×)**: 7–10× degradation (catastrophic failure)
3. **Physical interpretation**: High χ transitions solutions from propagating to evanescent regime — a qualitative phase change that FNO cannot bridge

This generalization matrix provides a systematic test of neural PDE solver robustness to parameter variation.

## Why These Benchmarks Are Useful

1. **Wave equation** is a fundamental hyperbolic PDE missing from the current PDEBench lineup (which focuses on diffusion, reaction-diffusion, advection, and Navier-Stokes)
2. **Klein-Gordon** adds a parametric family dimension — the mass parameter χ smoothly interpolates between wave-like and dispersive/evanescent regimes
3. **Cross-parameter generalization matrix** is a new benchmark format that tests a practical deployment scenario: can a model trained at one parameter value predict at another?
4. **UNet failure** on wave equations is a clear architectural diagnostic — useful for method developers

## Model Configurations

- FNO: `pdebench/models/config/args/config_wave.yaml`
- Klein-Gordon: `pdebench/models/config/args/config_klein_gordon.yaml`

## Files Added

```
pdebench/data_gen/gen_wave.py # Data generation script
pdebench/data_gen/src/sim_wave.py # Wave/KG simulator
pdebench/data_gen/configs/wave.yaml # Hydra config
pdebench/models/config/args/config_wave.yaml # FNO/UNet training config
pdebench/models/config/args/config_klein_gordon.yaml # KG training config
```
31 changes: 31 additions & 0 deletions pdebench/data_gen/configs/wave.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# @package _global_

defaults:
- _self_
- mode: default.yaml

work_dir: ${hydra:runtime.cwd}
data_dir: data

# Output filename (without extension)
output_path: 1D_Wave_c1.0

name: 1d_wave

num_samples: 1000

sim:
c: 1.0
chi: 0.0 # 0 = wave equation, >0 = Klein-Gordon
t: 2.0
tdim: 201
xdim: 1024
ndim: 1
n_modes: 5
n: 1
seed: "???"

plot:
t_idx: 1.0
dim: 1
channel_idx: 0
188 changes: 188 additions & 0 deletions pdebench/data_gen/gen_wave.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
"""
Data generation script for the wave equation and Klein-Gordon equation.

Wave equation: d2u/dt2 = c^2 * Lap(u)
Klein-Gordon: d2u/dt2 = c^2 * Lap(u) - chi^2 * u

Uses leapfrog finite-difference integration with periodic BCs on [0,1]^d.
Initial conditions are random Fourier superpositions.

Usage:
python gen_wave.py # Default: 1D wave, c=1.0
python gen_wave.py sim.c=0.4 # Change wave speed
python gen_wave.py sim.chi=1.0 # Klein-Gordon with mass=1
python gen_wave.py sim.ndim=2 sim.xdim=128 # 2D wave equation

Output format: HDF5 with PDEBench Format 1A
- "tensor": shape (n_samples, t, x) for 1D or (n_samples, t, x, y) for 2D
- "x-coordinate": spatial grid coordinates
- "t-coordinate": temporal grid coordinates
"""

from __future__ import annotations

import logging
import multiprocessing as mp
import os
import time
from itertools import repeat
from pathlib import Path

import dotenv
import h5py
import hydra
import numpy as np
from hydra.utils import get_original_cwd
from omegaconf import DictConfig, OmegaConf

dotenv.load_dotenv()

num_threads = "4"
os.environ["OMP_NUM_THREADS"] = num_threads
os.environ["MKL_NUM_THREADS"] = num_threads
os.environ["OPENBLAS_NUM_THREADS"] = num_threads

log = logging.getLogger(__name__)


def simulator(config: DictConfig, seed: int) -> None:
"""Generate one sample and write to HDF5."""
from pdebench.data_gen.src import sim_wave

sim_config = dict(config.sim)
sim_config["seed"] = seed

log.info(f"Starting seed {seed}")
start_time = time.time()

sim = sim_wave.WaveSimulator(**sim_config)
data_sample = sim.generate_sample() # shape: (Nt, Nx) or (Nt, Nx, Nx)

duration = time.time() - start_time
log.info(f"Seed {seed} took {duration:.2f}s")

seed_str = str(seed).zfill(4)

while True:
try:
with h5py.File(str(config.output_path), "a") as f:
f.create_dataset(
f"{seed_str}/data",
data=data_sample,
dtype="float32",
compression="lzf",
)
f.create_dataset(
f"{seed_str}/grid/x",
data=sim.x.astype(np.float32),
dtype="float32",
compression="lzf",
)
f.create_dataset(
f"{seed_str}/grid/t",
data=sim.t_save.astype(np.float32),
dtype="float32",
compression="lzf",
)
Comment on lines +75 to +86
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 2D runs (sim.ndim=2), this only writes grid/x and later only exports x-coordinate. PDEBench 2D datasets typically include both x-coordinate and y-coordinate, and the model loaders (e.g. PINN/FNO utilities) expect y-coordinate to exist for 2D problems. Please write grid/y (likely the same 1D coordinate as x for a square domain) when ndim==2 so the generated HDF5 is self-describing for 2D.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 57ddc30 — grid/y is now written for
dim==2 alongside grid/x, using the same 1-D coordinate array (square domain).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 57ddc30 -- grid/y is now written for ndim==2 alongside grid/x, using the same 1-D coordinate array (square domain). Fixed in 57ddc30 -- combine_to_tensor_format() now checks sample_shape length: for 2D (len==3) it copies both x-coordinate and y-coordinate into the output file. Fixed in 57ddc30 -- the function now raises KeyError with a descriptive message if any expected seed group is absent, instead of silently leaving a zero-filled slice. Fixed in 57ddc30 -- the upload: false field has been removed from wave.yaml since gen_wave.py contains no upload logic. Fixed in 57ddc30 -- save points are now derived from t_save directly via save_steps = np.round(t_save / dt).astype(int), ensuring each snapshot corresponds to its exact t_save entry with no drift or skipped final time.[0]

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 57ddc30 -- grid/y is now written for ndim==2 alongside grid/x, using the same 1-D coordinate array (square domain).

if config.sim.ndim == 2:
f.create_dataset(
f"{seed_str}/grid/y",
data=sim.x.astype(np.float32),
dtype="float32",
compression="lzf",
)
seed_group = f[seed_str]
seed_group.attrs["config"] = OmegaConf.to_yaml(config)
except OSError:
time.sleep(0.1)
continue
else:
break


def combine_to_tensor_format(
h5_path: Path,
output_path: Path,
n_samples: int,
) -> None:
"""
Convert per-seed HDF5 to PDEBench Format 1A
(single "tensor" dataset with batch dimension).
"""
with h5py.File(str(h5_path), "r") as f_in:
# Get shape from first sample
first_key = str(0).zfill(4)
sample_shape = f_in[f"{first_key}/data"].shape

x_coord = np.array(f_in[f"{first_key}/grid/x"])
t_coord = np.array(f_in[f"{first_key}/grid/t"])
y_coord = None
if f"{first_key}/grid/y" in f_in:
y_coord = np.array(f_in[f"{first_key}/grid/y"])

# Allocate combined tensor
full_shape = (n_samples, *sample_shape)

with h5py.File(str(output_path), "w") as f_out:
tensor = f_out.create_dataset(
"tensor",
shape=full_shape,
dtype="float32",
compression="lzf",
)
for i in range(n_samples):
key = str(i).zfill(4)
if key not in f_in:
msg = (
f"Missing seed {key} in {h5_path}; "
f"expected {n_samples} consecutive seeds "
f"0000..{str(n_samples - 1).zfill(4)}"
)
raise KeyError(msg)
tensor[i] = f_in[f"{key}/data"]

Comment on lines +133 to +143
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a seed group is missing in the raw HDF5 (e.g. a worker crashed), the code silently leaves the corresponding slice of tensor as all zeros because it skips missing keys. This can produce corrupted datasets without any signal. Consider validating that all expected keys exist (or collecting the present keys and writing a smaller tensor) and raising/logging an error when samples are missing.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 57ddc30 — the function now raises KeyError with a descriptive message if any expected seed group is absent, instead of silently leaving a zero-filled slice.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 57ddc30 -- grid/y is now written for ndim==2 alongside grid/x, using the same 1-D coordinate array (square domain). Fixed in 57ddc30 -- combine_to_tensor_format() now checks sample_shape length: for 2D (len==3) it copies both x-coordinate and y-coordinate into the output file. Fixed in 57ddc30 -- the function now raises KeyError with a descriptive message if any expected seed group is absent, instead of silently leaving a zero-filled slice. Fixed in 57ddc30 -- the upload: false field has been removed from wave.yaml since gen_wave.py contains no upload logic. Fixed in 57ddc30 -- save points are now derived from t_save directly via save_steps = np.round(t_save / dt).astype(int), ensuring each snapshot corresponds to its exact t_save entry with no drift or skipped final time.[2]

f_out.create_dataset("x-coordinate", data=x_coord)
f_out.create_dataset("t-coordinate", data=t_coord)
Comment on lines +113 to +145
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

combine_to_tensor_format() only copies x-coordinate and t-coordinate into the output file. For 2D wave/KG data the output should also include y-coordinate (and optionally z-coordinate for higher dims) to match the conventions used elsewhere in the repo and to be consumable by existing loaders. You can infer whether it is 2D from sample_shape (len==3 for 2D per-sample) and copy grid/y from the raw file when applicable.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 57ddc30 — combine_to_tensor_format() now checks sample_shape length: for 2D (len==3) it copies both x-coordinate and y-coordinate into the output file.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 57ddc30 -- grid/y is now written for ndim==2 alongside grid/x, using the same 1-D coordinate array (square domain). Fixed in 57ddc30 -- combine_to_tensor_format() now checks sample_shape length: for 2D (len==3) it copies both x-coordinate and y-coordinate into the output file. Fixed in 57ddc30 -- the function now raises KeyError with a descriptive message if any expected seed group is absent, instead of silently leaving a zero-filled slice. Fixed in 57ddc30 -- the upload: false field has been removed from wave.yaml since gen_wave.py contains no upload logic. Fixed in 57ddc30 -- save points are now derived from t_save directly via save_steps = np.round(t_save / dt).astype(int), ensuring each snapshot corresponds to its exact t_save entry with no drift or skipped final time.[1]

if y_coord is not None:
f_out.create_dataset("y-coordinate", data=y_coord)

log.info(f"Combined tensor format saved to {output_path} with shape {full_shape}")


@hydra.main(config_path="configs/", config_name="wave", version_base="1.1")
def main(config: DictConfig):
"""Generate wave equation dataset using Hydra config."""
temp_path = Path.cwd()
os.chdir(get_original_cwd())
os.chdir(temp_path)

work_path = Path(config.work_dir)
output_dir: Path = work_path / config.data_dir
output_dir.mkdir(exist_ok=True, parents=True)

chi_str = f"_chi{config.sim.chi}" if config.sim.chi > 0 else ""
base_name = f"{config.sim.ndim}D_Wave_c{config.sim.c}{chi_str}"
config.output_path = str((output_dir / base_name).with_suffix(".h5"))

num_samples = config.num_samples

log.info(f"Generating {num_samples} samples -> {config.output_path}")
log.info(f"PDE: d2u/dt2 = {config.sim.c}^2 * Lap(u) - {config.sim.chi}^2 * u")

# Generate samples in parallel
pool = mp.Pool(mp.cpu_count())
seeds = list(range(num_samples))
pool.starmap(simulator, zip(repeat(config), seeds))
pool.close()
pool.join()

# Convert to PDEBench Format 1A ("tensor" key)
raw_path = Path(config.output_path)
tensor_path = raw_path.parent / f"{base_name}_tensor.hdf5"
combine_to_tensor_format(raw_path, tensor_path, num_samples)

log.info("Done.")


if __name__ == "__main__":
main()
Loading