pdebench · gpartin · Mar 11, 2026 · Mar 11, 2026 · Mar 19, 2026 · Apr 2, 2026
diff --git a/README.md b/README.md
@@ -140,6 +140,8 @@ The data generation codes are contained in [data_gen](./pdebench/data_gen):
 - `gen_radial_dam_break.py` to generate the 2D shallow-water data.
 - `gen_ns_incomp.py` to generate the 2D incompressible inhomogeneous
   Navier-Stokes data.
+- `gen_wave.py` to generate 1D/2D wave equation and Klein-Gordon equation data.
+  See [WAVE_BENCHMARK.md](WAVE_BENCHMARK.md) for details and baseline results.
 - `plot.py` to plot the generated data.
 - `uploader.py` to upload the generated data to the data repository.
 - `.env` is the environment data to store Dataverse URL and API token to upload

diff --git a/WAVE_BENCHMARK.md b/WAVE_BENCHMARK.md
@@ -0,0 +1,109 @@
+# Wave Equation & Klein-Gordon Benchmark
+
+**Contributed by:** Greg Partin ([@gpartin](https://github.com/gpartin))
+
+New PDE benchmark tasks for PDEBench: the **wave equation** and **Klein-Gordon equation** in 1D and 2D with periodic boundary conditions.
+
+## Equations
+
+**Wave equation:**
+$$\frac{\partial^2 u}{\partial t^2} = c^2 \nabla^2 u$$
+
+**Klein-Gordon equation:**
+$$\frac{\partial^2 u}{\partial t^2} = c^2 \nabla^2 u - \chi^2 u$$
+
+Both solved on periodic domain $[0, 1]^d$ with random Fourier initial conditions.
+
+## Data Generation
+
+```bash
+# 1D wave equation, c=1.0, 1000 samples
+python -m pdebench.data_gen.gen_wave sim.c=1.0 sim.chi=0.0
+
+# Vary wave speed
+python -m pdebench.data_gen.gen_wave sim.c=0.1
+python -m pdebench.data_gen.gen_wave sim.c=0.4
+python -m pdebench.data_gen.gen_wave sim.c=2.0
+
+# Klein-Gordon with mass parameter
+python -m pdebench.data_gen.gen_wave sim.c=1.0 sim.chi=0.5
+python -m pdebench.data_gen.gen_wave sim.c=1.0 sim.chi=1.0
+python -m pdebench.data_gen.gen_wave sim.c=1.0 sim.chi=2.0
+python -m pdebench.data_gen.gen_wave sim.c=1.0 sim.chi=5.0
+
+# 2D wave equation
+python -m pdebench.data_gen.gen_wave sim.c=1.0 sim.ndim=2 sim.xdim=128
+```
+
+## Solver Details
+
+- **Method**: Leapfrog (Verlet) finite-difference, second-order in space and time
+- **Boundary conditions**: Periodic
+- **CFL**: dt = 0.5 * dx / (c * sqrt(ndim))
+- **Spatial resolution**: 1024 (1D), 128×128 (2D)
+- **Temporal resolution**: 201 snapshots over t ∈ [0, 2]
+- **Initial conditions**: Random superposition of 5 Fourier modes, normalized to unit max amplitude
+
+## HDF5 Format
+
+Output follows PDEBench Format 1A:
+- `"tensor"`: shape `(n_samples, t, x)` for 1D, `(n_samples, t, x, y)` for 2D
+- `"x-coordinate"`: spatial grid
+- `"t-coordinate"`: temporal grid
+
+## Baseline Results
+
+### 1D Wave Equation — FNO vs UNet
+
+Trained with autoregressive 41-step rollout, `initial_step=10`, spatial resolution 256 (downsampled 4×), temporal resolution 51 (downsampled 4×), 100 epochs.
+
+| Wave Speed c | FNO nRMSE | UNet nRMSE | Solver nRMSE |
+|:---:|:---:|:---:|:---:|
+| 0.1 | 0.101 | 0.537 | 1.68 × 10⁻⁴ |
+| 0.4 | 0.112 | 0.978 | 6.86 × 10⁻⁴ |
+| 1.0 | 0.099 | 0.934 | 1.71 × 10⁻³ |
+| 2.0 | 0.095 | 0.632 | 3.33 × 10⁻³ |
+
+- **FNO** achieves ~10% nRMSE, stable across wave speeds (spectral convolutions match wave dynamics)
+- **UNet** fails catastrophically (54–98% nRMSE) — spatial convolutions cannot capture global wave propagation
+- **Solver** achieves near-machine-precision (O(c² Δt²) discretization error)
+
+### Klein-Gordon Cross-Parameter Generalization
+
+FNO trained on one χ value, tested on all four. This tests whether neural surrogates can extrapolate across the mass parameter.
+
+| Train χ \ Test χ | 0.5 | 1.0 | 2.0 | 5.0 |
+|:---:|:---:|:---:|:---:|:---:|
+| **0.5** | **0.093** | 0.096 | 0.174 | 0.891 |
+| **1.0** | 0.097 | **0.094** | 0.154 | 0.877 |
+| **2.0** | 0.170 | 0.150 | **0.095** | 0.789 |
+| **5.0** | 0.783 | 0.771 | 0.707 | **0.098** |
+
+**Key findings:**
+1. **Small χ changes (≤2×)**: 1.0–1.9× degradation (FNO extrapolates acceptably)
+2. **Large χ changes (≥5×)**: 7–10× degradation (catastrophic failure)
+3. **Physical interpretation**: High χ transitions solutions from propagating to evanescent regime — a qualitative phase change that FNO cannot bridge
+
+This generalization matrix provides a systematic test of neural PDE solver robustness to parameter variation.
+
+## Why These Benchmarks Are Useful
+
+1. **Wave equation** is a fundamental hyperbolic PDE missing from the current PDEBench lineup (which focuses on diffusion, reaction-diffusion, advection, and Navier-Stokes)
+2. **Klein-Gordon** adds a parametric family dimension — the mass parameter χ smoothly interpolates between wave-like and dispersive/evanescent regimes
+3. **Cross-parameter generalization matrix** is a new benchmark format that tests a practical deployment scenario: can a model trained at one parameter value predict at another?
+4. **UNet failure** on wave equations is a clear architectural diagnostic — useful for method developers
+
+## Model Configurations
+
+- FNO: `pdebench/models/config/args/config_wave.yaml`
+- Klein-Gordon: `pdebench/models/config/args/config_klein_gordon.yaml`
+
+## Files Added
+
+```
+pdebench/data_gen/gen_wave.py                      # Data generation script
+pdebench/data_gen/src/sim_wave.py                   # Wave/KG simulator
+pdebench/data_gen/configs/wave.yaml                 # Hydra config
+pdebench/models/config/args/config_wave.yaml        # FNO/UNet training config
+pdebench/models/config/args/config_klein_gordon.yaml # KG training config
+```
diff --git a/pdebench/data_gen/configs/wave.yaml b/pdebench/data_gen/configs/wave.yaml
@@ -0,0 +1,31 @@
+# @package _global_
+
+defaults:
+  - _self_
+  - mode: default.yaml
+
+work_dir: ${hydra:runtime.cwd}
+data_dir: data
+
+# Output filename (without extension)
+output_path: 1D_Wave_c1.0
+
+name: 1d_wave
+
+num_samples: 1000
+
+sim:
+  c: 1.0
+  chi: 0.0       # 0 = wave equation, >0 = Klein-Gordon
+  t: 2.0
+  tdim: 201
+  xdim: 1024
+  ndim: 1
+  n_modes: 5
+  n: 1
+  seed: "???"
+
+plot:
+  t_idx: 1.0
+  dim: 1
+  channel_idx: 0
diff --git a/pdebench/data_gen/gen_wave.py b/pdebench/data_gen/gen_wave.py
@@ -0,0 +1,188 @@
+"""
+Data generation script for the wave equation and Klein-Gordon equation.
+
+Wave equation: d2u/dt2 = c^2 * Lap(u)
+Klein-Gordon:  d2u/dt2 = c^2 * Lap(u) - chi^2 * u
+
+Uses leapfrog finite-difference integration with periodic BCs on [0,1]^d.
+Initial conditions are random Fourier superpositions.
+
+Usage:
+    python gen_wave.py                        # Default: 1D wave, c=1.0
+    python gen_wave.py sim.c=0.4              # Change wave speed
+    python gen_wave.py sim.chi=1.0            # Klein-Gordon with mass=1
+    python gen_wave.py sim.ndim=2 sim.xdim=128  # 2D wave equation
+
+Output format: HDF5 with PDEBench Format 1A
+    - "tensor": shape (n_samples, t, x) for 1D or (n_samples, t, x, y) for 2D
+    - "x-coordinate": spatial grid coordinates
+    - "t-coordinate": temporal grid coordinates
+"""
+
+from __future__ import annotations
+
+import logging
+import multiprocessing as mp
+import os
+import time
+from itertools import repeat
+from pathlib import Path
+
+import dotenv
+import h5py
+import hydra
+import numpy as np
+from hydra.utils import get_original_cwd
+from omegaconf import DictConfig, OmegaConf
+
+dotenv.load_dotenv()
+
+num_threads = "4"
+os.environ["OMP_NUM_THREADS"] = num_threads
+os.environ["MKL_NUM_THREADS"] = num_threads
+os.environ["OPENBLAS_NUM_THREADS"] = num_threads
+
+log = logging.getLogger(__name__)
+
+
+def simulator(config: DictConfig, seed: int) -> None:
+    """Generate one sample and write to HDF5."""
+    from pdebench.data_gen.src import sim_wave
+
+    sim_config = dict(config.sim)
+    sim_config["seed"] = seed
+
+    log.info(f"Starting seed {seed}")
+    start_time = time.time()
+
+    sim = sim_wave.WaveSimulator(**sim_config)
+    data_sample = sim.generate_sample()  # shape: (Nt, Nx) or (Nt, Nx, Nx)
+
+    duration = time.time() - start_time
+    log.info(f"Seed {seed} took {duration:.2f}s")
+
+    seed_str = str(seed).zfill(4)
+
+    while True:
+        try:
+            with h5py.File(str(config.output_path), "a") as f:
+                f.create_dataset(
+                    f"{seed_str}/data",
+                    data=data_sample,
+                    dtype="float32",
+                    compression="lzf",
+                )
+                f.create_dataset(
+                    f"{seed_str}/grid/x",
+                    data=sim.x.astype(np.float32),
+                    dtype="float32",
+                    compression="lzf",
+                )
+                f.create_dataset(
+                    f"{seed_str}/grid/t",
+                    data=sim.t_save.astype(np.float32),
+                    dtype="float32",
+                    compression="lzf",
+                )
+                if config.sim.ndim == 2:
+                    f.create_dataset(
+                        f"{seed_str}/grid/y",
+                        data=sim.x.astype(np.float32),
+                        dtype="float32",
+                        compression="lzf",
+                    )
+                seed_group = f[seed_str]
+                seed_group.attrs["config"] = OmegaConf.to_yaml(config)
+        except OSError:
+            time.sleep(0.1)
+            continue
+        else:
+            break
+
+
+def combine_to_tensor_format(
+    h5_path: Path,
+    output_path: Path,
+    n_samples: int,
+) -> None:
+    """
+    Convert per-seed HDF5 to PDEBench Format 1A
+    (single "tensor" dataset with batch dimension).
+    """
+    with h5py.File(str(h5_path), "r") as f_in:
+        # Get shape from first sample
+        first_key = str(0).zfill(4)
+        sample_shape = f_in[f"{first_key}/data"].shape
+
+        x_coord = np.array(f_in[f"{first_key}/grid/x"])
+        t_coord = np.array(f_in[f"{first_key}/grid/t"])
+        y_coord = None
+        if f"{first_key}/grid/y" in f_in:
+            y_coord = np.array(f_in[f"{first_key}/grid/y"])
+
+        # Allocate combined tensor
+        full_shape = (n_samples, *sample_shape)
+
+        with h5py.File(str(output_path), "w") as f_out:
+            tensor = f_out.create_dataset(
+                "tensor",
+                shape=full_shape,
+                dtype="float32",
+                compression="lzf",
+            )
+            for i in range(n_samples):
+                key = str(i).zfill(4)
+                if key not in f_in:
+                    msg = (
+                        f"Missing seed {key} in {h5_path}; "
+                        f"expected {n_samples} consecutive seeds "
+                        f"0000..{str(n_samples - 1).zfill(4)}"
+                    )
+                    raise KeyError(msg)
+                tensor[i] = f_in[f"{key}/data"]
+
+            f_out.create_dataset("x-coordinate", data=x_coord)
+            f_out.create_dataset("t-coordinate", data=t_coord)
+            if y_coord is not None:
+                f_out.create_dataset("y-coordinate", data=y_coord)
+
+    log.info(f"Combined tensor format saved to {output_path} with shape {full_shape}")
+
+
+@hydra.main(config_path="configs/", config_name="wave", version_base="1.1")
+def main(config: DictConfig):
+    """Generate wave equation dataset using Hydra config."""
+    temp_path = Path.cwd()
+    os.chdir(get_original_cwd())
+    os.chdir(temp_path)
+
+    work_path = Path(config.work_dir)
+    output_dir: Path = work_path / config.data_dir
+    output_dir.mkdir(exist_ok=True, parents=True)
+
+    chi_str = f"_chi{config.sim.chi}" if config.sim.chi > 0 else ""
+    base_name = f"{config.sim.ndim}D_Wave_c{config.sim.c}{chi_str}"
+    config.output_path = str((output_dir / base_name).with_suffix(".h5"))
+
+    num_samples = config.num_samples
+
+    log.info(f"Generating {num_samples} samples -> {config.output_path}")
+    log.info(f"PDE: d2u/dt2 = {config.sim.c}^2 * Lap(u) - {config.sim.chi}^2 * u")
+
+    # Generate samples in parallel
+    pool = mp.Pool(mp.cpu_count())
+    seeds = list(range(num_samples))
+    pool.starmap(simulator, zip(repeat(config), seeds))
+    pool.close()
+    pool.join()
+
+    # Convert to PDEBench Format 1A ("tensor" key)
+    raw_path = Path(config.output_path)
+    tensor_path = raw_path.parent / f"{base_name}_tensor.hdf5"
+    combine_to_tensor_format(raw_path, tensor_path, num_samples)
+
+    log.info("Done.")
+
+
+if __name__ == "__main__":
+    main()