Skip to content

Commit f263f1a

Browse files
authored
feat: Add entrypoint for anemoi-datasets to PyEarthTools (#216)
* feat: Add entrypoint for anemoi-datasets to PyEarthTools * Allow usage of pipeline directly * Add ECMWF License * Apply suggestions from code review Co-authored-by: Tennessee Leeuwenburg <134973832+tennlee@users.noreply.github.com>
1 parent b1217c6 commit f263f1a

5 files changed

Lines changed: 118 additions & 0 deletions

File tree

NOTICE.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,3 +15,5 @@ The file packages/data/src/pyearthtools/data/indexes/extensions.py extends and i
1515
The package packages/bundled_models/fourcastnext extends and is significantly based on the code from https://github.com/nci/FourCastNeXt which is made available under the Apache 2.0 license. That repository in turn extends the code from https://github.com/NVlabs/FourCastNet/, released under the BSD 3-Clause license. The FourCastNet model is described in detail at https://arxiv.org/abs/2202.11214. The FourCastNeXt model is described in detail at https://arxiv.org/abs/2401.05584, and a version of the FourCastNeXt code is bundled, adapted for compatibility and maintained within the PyEarthTools repository so it can continue to be a useful reference implementation and learning aid.
1616

1717
The package packages/bundled_models/lucie extends and is based on the code from https://github.com/ISCLPennState/LUCIE, which is made available under the MIT license. The LUCIE model is described in detail at https://doi.org/10.48550/arXiv.2405.16297. The version of the model bundled in PyEarthTools may undergo changes associated with package maintenance and compatibility so it can continue to be a useful reference implementation and learning aid. Within that repository, those authors bundle the file "torch_harmonics_local.py", which is based on https://github.com/NVIDIA/torch-harmonics . The bundled file has an Apache 2.0 copyright statement included in it but at the time of writing the NVIDIA repository carries the BSD 3-clause license. Both of these licenses allow bundling to occur and all relevant files preserve the copyright statement within the files. Copyright for the original works go to the LUCIE and torch-harmonics developers respectively.
18+
19+
The file packages/pipeline/src/pyearthtools/pipeline/entrypoints/anemoi.py was originally developed by ECMWF, released under the Apache 2.0 license.

docs/api/api.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ data/data_index
2424
data/data_api
2525
pipeline/pipeline_index
2626
pipeline/pipeline_api
27+
pipeline/pipeline_entrypoints
2728
training/training_index
2829
training/training_api
2930
tutorial/tutorial_index
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# Pipeline Entrypoints
2+
3+
As `PyEarthTools` pipelines propose a generic way to load and prepare various earth system datasets, it is possible to use
4+
a pipeline as a source for [anemoi-datasets](https://anemoi.readthedocs.io/projects/datasets/en/latest/).
5+
6+
## Example
7+
8+
Below is a minimal example of using a `PyEarthTools` pipeline to load data and prepare it for `anemoi`, please see the `anemoi` docs
9+
for more information on the `datasets` config.
10+
11+
### Create the Pipeline in PyEarthTools
12+
13+
.. code-block:: python
14+
15+
import pyearthtools.data
16+
import pyearthtools.pipeline
17+
18+
pipeline = pyearthtools.pipeline.Pipeline(
19+
pyearthtools.data.download.arcoera5.ARCOERA5(['t2m', 'u10', 'v10']),
20+
pyearthtools.pipeline.operations.xarray.values.FillNan()
21+
)
22+
pipeline.save('/PATH/TO/PIPELINE.yaml')
23+
24+
### Create the anemoi-datasets config
25+
26+
.. code-block:: yaml
27+
28+
name: pyearthtools_to_anemoi
29+
description: PyEarthTools Pipeline converted to Anemoi
30+
attribution: PyEarthTools
31+
32+
dates:
33+
start: '2025-11-10T00:00:00'
34+
end: '2025-11-12T00:00:00'
35+
frequency: 1h
36+
37+
input:
38+
pyearthtools: # Use the pyearthtools input object
39+
pipeline: /PATH/TO/PIPELINE.yaml
40+
41+
### Run anemoi-datasets
42+
43+
.. code-block:: bash
44+
45+
anemoi-datasets create /path/to/anemoi/dataset.yaml
46+
47+
## Function Contract
48+
49+
The expected contract and result from the `PyEarthTools` pipeline is to return an `xarray` object of a single time index.
50+
51+
Both tools provide methods to modify the metadata of the data, and should be used accordingly to prepare for downstream uses.

packages/pipeline/pyproject.toml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,10 @@ dependencies = [
2929

3030
dynamic = ["version", "readme"]
3131

32+
[project.entry-points]
33+
# Add PyEarthTools as an anemoi datasets source
34+
"anemoi.datasets.create.sources".pyearthtools = "pyearthtools.pipeline.entrypoints.anemoi:pyearthtoolsSource"
35+
3236
[project.optional-dependencies]
3337
distributed = [
3438
"dask",
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
# (C) Copyright 2025- European Centre for Medium-Range Weather Forecasts (ECMWF)
2+
3+
# This software is licensed under the terms of the Apache Licence Version 2.0
4+
# which can be obtained at http://www.apache.org/licenses/LICENSE-2.0.
5+
# In applying this licence, ECMWF does not waive the privileges and immunities
6+
# granted to it by virtue of its status as an intergovernmental organisation nor
7+
# does it submit to any jurisdiction.
8+
9+
10+
from functools import cached_property
11+
from pathlib import Path
12+
13+
from pyearthtools.pipeline import load
14+
from pyearthtools.pipeline import Pipeline
15+
16+
import earthkit.data as ekd
17+
from anemoi.datasets.create.source import Source
18+
from anemoi.datasets.create.typing import DateList
19+
20+
21+
class pyearthtoolsSource(Source):
22+
emoji = "🌏" # For tracing
23+
24+
def __init__(self, context, pipeline: str | Path | Pipeline):
25+
"""Initialise the source.
26+
27+
Parameters
28+
----------
29+
context : Any
30+
The context for the data source.
31+
pipeline: str
32+
The path to the pyearthtools pipeline file.
33+
"""
34+
super().__init__(context)
35+
self._pyearthtools_pipeline = pipeline
36+
37+
@cached_property
38+
def pipeline(self) -> Pipeline:
39+
pipeline = self._pyearthtools_pipeline
40+
if isinstance(pipeline, Pipeline):
41+
return pipeline
42+
return load(pipeline)
43+
44+
def execute(self, dates: DateList) -> ekd.FieldList:
45+
"""Execute the source.
46+
47+
Parameters
48+
----------
49+
dates : DateList
50+
The input dates.
51+
52+
Returns
53+
-------
54+
ekd.FieldList
55+
The output data.
56+
"""
57+
fields = []
58+
for date in dates:
59+
fields.extend(ekd.from_object(self.pipeline[date.isoformat()])) # type: ignore
60+
return ekd.FieldList.from_fields(fields)

0 commit comments

Comments
 (0)