Add wave equation and Klein-Gordon equation benchmark tasks by gpartin · Pull Request #97 · pdebench/PDEBench

gpartin · 2026-03-11T16:02:32Z

Summary

This PR adds two new PDE benchmark tasks to PDEBench: the wave equation and the Klein-Gordon equation in 1D and 2D with periodic boundary conditions.

Equations

Wave equation: $\partial^2 u / \partial t^2 = c^2 \nabla^2 u$

Klein-Gordon: $\partial^2 u / \partial t^2 = c^2 \nabla^2 u - \chi^2 u$

Why these benchmarks?

Hyperbolic PDEs - wave equations are underrepresented in PDEBench (most tasks are parabolic/elliptic)
Variable wave speed - \c\ parameter creates a natural difficulty ladder
Klein-Gordon mass parameter - \chi\ introduces a propagating-to-evanescent transition that exposes catastrophic extrapolation failure in neural operators
Analytical validation - exact Fourier solutions available for solver verification

Baseline results (FNO, 100 epochs, 1D)

Wave speed c	nRMSE
0.1	0.101
0.4	0.112
1.0	0.099
2.0	0.095

Klein-Gordon cross-chi generalization (FNO)

Training on one chi value and testing on another reveals:

Train \ Test	0.5	1.0	2.0	5.0
0.5	0.093	0.096	0.174	0.891
1.0	0.097	0.094	0.154	0.877
2.0	0.170	0.150	0.095	0.789
5.0	0.783	0.771	0.707	0.098

FNO extrapolates well for small parameter shifts but catastrophically fails across the propagating-to-evanescent transition (chi=2 to chi=5: nRMSE jumps from 0.095 to 0.789).

Files added

\pdebench/data_gen/src/sim_wave.py\ - Pure NumPy simulator (leapfrog, 1D/2D)
\pdebench/data_gen/gen_wave.py\ - Hydra-based data generation with multiprocessing
\pdebench/data_gen/configs/wave.yaml\ - Default generation config
\pdebench/models/config/args/config_wave.yaml\ - FNO/UNet training config
\pdebench/models/config/args/config_klein_gordon.yaml\ - KG training config
\WAVE_BENCHMARK.md\ - Full documentation with baseline results

Files modified

\README.md\ - Added gen_wave.py entry in Data Generation section

See WAVE_BENCHMARK.md for full details.

Copilot

Pull request overview

This PR adds new PDEBench benchmark tasks and supporting artifacts for the 1D/2D wave equation and Klein–Gordon equation, including a NumPy-based simulator, a Hydra-based dataset generator, training configs, and documentation.

Changes:

Added WaveSimulator (1D/2D) and an FFT-based 1D analytical solution helper.
Added gen_wave.py + wave.yaml to generate datasets in a PDEBench-style HDF5 layout.
Added model argument configs and benchmark documentation; updated README to reference the new generator.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
pdebench/models/config/args/config_wave.yaml	Adds a wave-equation training config (FNO/UNet params).
pdebench/models/config/args/config_klein_gordon.yaml	Adds a Klein–Gordon training config (parameterized by χ).
pdebench/data_gen/src/sim_wave.py	Implements leapfrog/Verlet simulator and 1D analytical solution.
pdebench/data_gen/gen_wave.py	Implements Hydra-driven dataset generation + tensor-format consolidation.
pdebench/data_gen/configs/wave.yaml	Adds default Hydra config for wave/KG generation.
WAVE_BENCHMARK.md	Documents equations, generation usage, and baseline results.
README.md	Adds `gen_wave.py` to the data generation section.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-11T16:10:37Z

pdebench/data_gen/src/sim_wave.py

+            lap = laplacian(u_curr)
+            u_next = 2 * u_curr - u_prev + c2dt2 * lap - chi2dt2 * u_curr
+            u_prev = u_curr
+            u_curr = u_next
+
+            if save_idx < self.Nt and step % save_interval == 0:
+                result[save_idx] = u_curr.astype(np.float32)
+                save_idx += 1
+


The leapfrog loop saves u_curr after updating it, but u_curr is initialized to the t=dt state (via the Taylor half-step) and then advanced immediately in the first iteration. This makes the saved frame for step=1 correspond to ~t=2*dt (and in general shifts snapshot times by one step), further desynchronizing the saved solution from the t_save coordinate. Saving u_curr before advancing (or adjusting the step/time bookkeeping so u_curr corresponds to step*dt) would fix the off-by-one.

Suggested change

lap = laplacian(u_curr)

u_next = 2 * u_curr - u_prev + c2dt2 * lap - chi2dt2 * u_curr

u_prev = u_curr

u_curr = u_next

if save_idx < self.Nt and step % save_interval == 0:

result[save_idx] = u_curr.astype(np.float32)

save_idx += 1

if save_idx < self.Nt and step % save_interval == 0:

result[save_idx] = u_curr.astype(np.float32)

save_idx += 1

lap = laplacian(u_curr)

u_next = 2 * u_curr - u_prev + c2dt2 * lap - chi2dt2 * u_curr

u_prev = u_curr

u_curr = u_next

Fixed in 57ddc30 — the leapfrog loop now saves u_curr before the advance step, and save steps are precomputed from _save via
p.searchsorted so snapshots align exactly with the requested times (including the final step).

Fixed in 57ddc30 — the leapfrog loop now saves u_curr before the advance step, and save_steps are precomputed from t_save via np.searchsorted so snapshots align exactly with the requested times (including the final step).

Copilot · 2026-03-11T16:10:38Z

pdebench/data_gen/gen_wave.py

+                f.create_dataset(
+                    f"{seed_str}/grid/x",
+                    data=sim.x.astype(np.float32),
+                    dtype="float32",
+                    compression="lzf",
+                )
+                f.create_dataset(
+                    f"{seed_str}/grid/t",
+                    data=sim.t_save.astype(np.float32),
+                    dtype="float32",
+                    compression="lzf",
+                )


For 2D runs (sim.ndim=2), this only writes grid/x and later only exports x-coordinate. PDEBench 2D datasets typically include both x-coordinate and y-coordinate, and the model loaders (e.g. PINN/FNO utilities) expect y-coordinate to exist for 2D problems. Please write grid/y (likely the same 1D coordinate as x for a square domain) when ndim==2 so the generated HDF5 is self-describing for 2D.

Fixed in 57ddc30 — grid/y is now written for
dim==2 alongside grid/x, using the same 1-D coordinate array (square domain).

Fixed in 57ddc30 -- grid/y is now written for ndim==2 alongside grid/x, using the same 1-D coordinate array (square domain). Fixed in 57ddc30 -- combine_to_tensor_format() now checks sample_shape length: for 2D (len==3) it copies both x-coordinate and y-coordinate into the output file. Fixed in 57ddc30 -- the function now raises KeyError with a descriptive message if any expected seed group is absent, instead of silently leaving a zero-filled slice. Fixed in 57ddc30 -- the upload: false field has been removed from wave.yaml since gen_wave.py contains no upload logic. Fixed in 57ddc30 -- save points are now derived from t_save directly via save_steps = np.round(t_save / dt).astype(int), ensuring each snapshot corresponds to its exact t_save entry with no drift or skipped final time.[0]

Fixed in 57ddc30 -- grid/y is now written for ndim==2 alongside grid/x, using the same 1-D coordinate array (square domain).

Copilot · 2026-03-11T16:10:38Z

pdebench/data_gen/gen_wave.py

+        # Get shape from first sample
+        first_key = str(0).zfill(4)
+        sample_shape = f_in[f"{first_key}/data"].shape
+
+        x_coord = np.array(f_in[f"{first_key}/grid/x"])
+        t_coord = np.array(f_in[f"{first_key}/grid/t"])
+
+        # Allocate combined tensor
+        full_shape = (n_samples, *sample_shape)
+
+        with h5py.File(str(output_path), "w") as f_out:
+            tensor = f_out.create_dataset(
+                "tensor",
+                shape=full_shape,
+                dtype="float32",
+                compression="lzf",
+            )
+            for i in range(n_samples):
+                key = str(i).zfill(4)
+                if key in f_in:
+                    tensor[i] = f_in[f"{key}/data"]
+
+            f_out.create_dataset("x-coordinate", data=x_coord)
+            f_out.create_dataset("t-coordinate", data=t_coord)


combine_to_tensor_format() only copies x-coordinate and t-coordinate into the output file. For 2D wave/KG data the output should also include y-coordinate (and optionally z-coordinate for higher dims) to match the conventions used elsewhere in the repo and to be consumable by existing loaders. You can infer whether it is 2D from sample_shape (len==3 for 2D per-sample) and copy grid/y from the raw file when applicable.

Fixed in 57ddc30 — combine_to_tensor_format() now checks sample_shape length: for 2D (len==3) it copies both x-coordinate and y-coordinate into the output file.

Fixed in 57ddc30 -- grid/y is now written for ndim==2 alongside grid/x, using the same 1-D coordinate array (square domain). Fixed in 57ddc30 -- combine_to_tensor_format() now checks sample_shape length: for 2D (len==3) it copies both x-coordinate and y-coordinate into the output file. Fixed in 57ddc30 -- the function now raises KeyError with a descriptive message if any expected seed group is absent, instead of silently leaving a zero-filled slice. Fixed in 57ddc30 -- the upload: false field has been removed from wave.yaml since gen_wave.py contains no upload logic. Fixed in 57ddc30 -- save points are now derived from t_save directly via save_steps = np.round(t_save / dt).astype(int), ensuring each snapshot corresponds to its exact t_save entry with no drift or skipped final time.[1]

Copilot · 2026-03-11T16:10:39Z

pdebench/data_gen/gen_wave.py

+            for i in range(n_samples):
+                key = str(i).zfill(4)
+                if key in f_in:
+                    tensor[i] = f_in[f"{key}/data"]
+


If a seed group is missing in the raw HDF5 (e.g. a worker crashed), the code silently leaves the corresponding slice of tensor as all zeros because it skips missing keys. This can produce corrupted datasets without any signal. Consider validating that all expected keys exist (or collecting the present keys and writing a smaller tensor) and raising/logging an error when samples are missing.

Fixed in 57ddc30 — the function now raises KeyError with a descriptive message if any expected seed group is absent, instead of silently leaving a zero-filled slice.

Fixed in 57ddc30 -- grid/y is now written for ndim==2 alongside grid/x, using the same 1-D coordinate array (square domain). Fixed in 57ddc30 -- combine_to_tensor_format() now checks sample_shape length: for 2D (len==3) it copies both x-coordinate and y-coordinate into the output file. Fixed in 57ddc30 -- the function now raises KeyError with a descriptive message if any expected seed group is absent, instead of silently leaving a zero-filled slice. Fixed in 57ddc30 -- the upload: false field has been removed from wave.yaml since gen_wave.py contains no upload logic. Fixed in 57ddc30 -- save points are now derived from t_save directly via save_steps = np.round(t_save / dt).astype(int), ensuring each snapshot corresponds to its exact t_save entry with no drift or skipped final time.[2]

Copilot · 2026-03-11T16:10:39Z

pdebench/data_gen/configs/wave.yaml

+
+work_dir: ${hydra:runtime.cwd}
+data_dir: data
+upload: false


upload: false is defined in this config, but gen_wave.py currently never checks config.upload nor performs an upload step (unlike other generators). Either implement the upload path for consistency or remove/rename the config field to avoid suggesting functionality that doesn't exist.

Suggested change

upload: false

Fixed in 57ddc30 — the upload: false field has been removed from wave.yaml since gen_wave.py contains no upload logic.

Fixed in 57ddc30 -- grid/y is now written for ndim==2 alongside grid/x, using the same 1-D coordinate array (square domain). Fixed in 57ddc30 -- combine_to_tensor_format() now checks sample_shape length: for 2D (len==3) it copies both x-coordinate and y-coordinate into the output file. Fixed in 57ddc30 -- the function now raises KeyError with a descriptive message if any expected seed group is absent, instead of silently leaving a zero-filled slice. Fixed in 57ddc30 -- the upload: false field has been removed from wave.yaml since gen_wave.py contains no upload logic. Fixed in 57ddc30 -- save points are now derived from t_save directly via save_steps = np.round(t_save / dt).astype(int), ensuring each snapshot corresponds to its exact t_save entry with no drift or skipped final time.[3]

Copilot · 2026-03-11T16:10:39Z

pdebench/data_gen/src/sim_wave.py

+        # Save schedule
+        if self.ndim == 1:
+            result = np.zeros((self.Nt, self.Nx), dtype=np.float32)
+        else:
+            result = np.zeros((self.Nt, self.Nx, self.Nx), dtype=np.float32)
+
+        result[0] = u0.astype(np.float32)
+        save_idx = 1
+        save_interval = max(1, self.n_steps // (self.Nt - 1))
+


t_save is defined as linspace(0, T, Nt), but the saving logic uses save_interval = n_steps // (Nt - 1) and only saves when step % save_interval == 0. This generally produces snapshots at times that do not match t_save (and may skip the final time T if n_steps is not an exact multiple of Nt-1). Consider computing an explicit monotone list of save_steps that matches t_save (including the final step) and saving exactly at those steps, or derive t_save from the actual saved step indices.

Fixed in 57ddc30 — save points are now derived from _save directly: save_steps = np.round(t_save / dt).astype(int), ensuring each snapshot corresponds to its exact _save entry with no drift or skipped final time.

Fixed in 57ddc30 -- grid/y is now written for ndim==2 alongside grid/x, using the same 1-D coordinate array (square domain). Fixed in 57ddc30 -- combine_to_tensor_format() now checks sample_shape length: for 2D (len==3) it copies both x-coordinate and y-coordinate into the output file. Fixed in 57ddc30 -- the function now raises KeyError with a descriptive message if any expected seed group is absent, instead of silently leaving a zero-filled slice. Fixed in 57ddc30 -- the upload: false field has been removed from wave.yaml since gen_wave.py contains no upload logic. Fixed in 57ddc30 -- save points are now derived from t_save directly via save_steps = np.round(t_save / dt).astype(int), ensuring each snapshot corresponds to its exact t_save entry with no drift or skipped final time.[4]

- Save before advance in leapfrog loop to fix off-by-one snapshot timing - Precompute save steps from t_save for exact time alignment - Write grid/y for 2D simulations in per-seed HDF5 - Copy y-coordinate into combined tensor format for 2D - Raise KeyError for missing seeds instead of silent zero-fill - Remove unused 'upload' config field from wave.yaml

10 tests covering: - 1D/2D output shape and dtype (float32) - Finite output (no NaN/Inf) - Invalid ndim raises ValueError - Klein-Gordon chi>0 runs in 1D and 2D - Leapfrog vs analytical solution nRMSE < 1% (wave and KG) - analytical_solution_1d returns u0 at t=0 All tests pass in 0.33s.

gpartin · 2026-03-31T22:27:15Z

Thanks for the thorough review! All 6 comments have been addressed in commit 57ddc30:

sim_wave.py off-by-one: Save now happens before the leapfrog advance step
t_save alignment: Save steps precomputed via
p.round(t_save / dt).astype(int)\ — exact alignment, no drift, final step always captured
2D y-coordinate (gen_wave.py): \grid/y\ written for
dim==2\
2D y-coordinate (combine_to_tensor_format): \y-coordinate\ now copied for 2D sample shapes
Missing seed validation: Raises \KeyError\ instead of silent zero-fill
wave.yaml upload field: Removed

Additionally, commit e703ff0 adds a pytest suite covering the simulator, analytical solution, and 1D/2D data generation.

Add wave equation and Klein-Gordon equation benchmark tasks

b5a1e56

Copilot AI review requested due to automatic review settings March 11, 2026 16:02

Copilot started reviewing on behalf of gpartin March 11, 2026 16:03 View session

Copilot AI reviewed Mar 11, 2026

View reviewed changes

Ceyron mentioned this pull request Mar 12, 2026

Add Klein-Gordon equation stepper Ceyron/exponax#103

Closed

Conversation

gpartin commented Mar 11, 2026

Summary

Equations

Why these benchmarks?

Baseline results (FNO, 100 epochs, 1D)

Klein-Gordon cross-chi generalization (FNO)

Files added

Files modified

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gpartin commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants