Skip to content
Open
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
c129966
Add recognised intake-esm datastores on NCI systems to config_develop…
charles-turner-1 Feb 4, 2025
b1b76fb
Skeleton
charles-turner-1 Feb 5, 2025
dd73d1d
Playing around
charles-turner-1 Feb 5, 2025
ed1676b
Almost at a working IntakeDataset.load()
charles-turner-1 Feb 12, 2025
fa1ea2e
Working intake-esm implementation - probably still some kinks to iron…
charles-turner-1 Feb 25, 2025
648f119
Working with multiple catalogues per project
charles-turner-1 Mar 12, 2025
2b91fec
Cleanup - mypy & ruff errors
charles-turner-1 Mar 13, 2025
c7b8ffb
Remove WIP
charles-turner-1 Mar 13, 2025
31b35cb
Update depenencies & dev environment
charles-turner-1 Mar 13, 2025
a8532a5
Pre-commit modifications
charles-turner-1 Mar 13, 2025
7e56959
Merge branch 'main' into intake-esm
charles-turner-1 Mar 13, 2025
568cb8d
Fixed most of codacy (mypy-strict?) gripes
charles-turner-1 Mar 13, 2025
91fee56
Fix typo
charles-turner-1 Mar 13, 2025
9d894b9
Beginning to work on Bouwe's comments (WIP)
charles-turner-1 Apr 2, 2025
59d0d02
Updates - restructured esmvalcore/data/intake following Bouwe's sugge…
charles-turner-1 Apr 3, 2025
2050081
Reorder imports (ruff maybe?)
charles-turner-1 May 6, 2025
59e4205
Add `_read_facets` to intake configuration: see https://github.com/in…
charles-turner-1 May 12, 2025
2527059
Add `merge_intake_seach_history` function (see https://github.com/int…
charles-turner-1 May 13, 2025
4641965
Merge branch 'main' into intake-esm
charles-turner-1 May 13, 2025
1b26148
Merge branch 'main' into intake-esm
valeriupredoi Dec 4, 2025
b77d194
readd intake
valeriupredoi Dec 4, 2025
e131cfc
Merge branch 'main' into intake-esm
charles-turner-1 Jan 20, 2026
a53e140
Add `data.io.intake_esm.py, scaffold off `data.io.intake_esgf.py`
charles-turner-1 Jan 22, 2026
b84cf70
WIP
charles-turner-1 Jan 22, 2026
ef6fdba
Scaffold tests
charles-turner-1 Jan 22, 2026
f1b8f55
Remove /data/intake stuff, /config/_intake
charles-turner-1 Jan 30, 2026
ec17bfa
Pre-commit
charles-turner-1 Jan 30, 2026
81fc7dc
Merge branch 'main' into intake-esm
charles-turner-1 Jan 30, 2026
5417f6c
Nearly there I think - all tests passing. Hopefully CI can tell me wh…
charles-turner-1 Feb 4, 2026
d988b17
Pre-commit
charles-turner-1 Feb 4, 2026
8533644
Remove old intake-esm file
charles-turner-1 Feb 4, 2026
1c41a35
Merge branch 'main' into intake-esm
charles-turner-1 Feb 4, 2026
15a79f5
Merge branch 'main' into intake-esm
charles-turner-1 Feb 5, 2026
b7ceeea
Sort keys when finding data - should guarantee order stability
charles-turner-1 Feb 5, 2026
6dd736f
Change path import style to match `/tests/integration/preprocessor/_i…
charles-turner-1 Feb 5, 2026
5403ac8
Revert ugly type ignore stuff
charles-turner-1 Feb 5, 2026
27aa007
Un-ignore the intake-esm data ncfiles
charles-turner-1 Feb 5, 2026
b911086
Merge branch 'main' into intake-esm
charles-turner-1 Feb 9, 2026
ce4b2eb
- Pass through quiet arg correctly, remove special time range parsing…
charles-turner-1 Feb 9, 2026
e731b2e
Merge branch 'main' into intake-esm
charles-turner-1 Mar 16, 2026
664044a
Remove comment
charles-turner-1 Mar 16, 2026
b9fccda
Merge branch 'intake-esm' of github.com:ESMValGroup/ESMValCore into i…
charles-turner-1 Mar 16, 2026
67ea7fd
Remove line subsetting variables - we're using paths anyway, so subse…
charles-turner-1 Mar 23, 2026
ecb5057
Add a `time_separator` field (needed for flexibility in how datastoer…
charles-turner-1 Mar 23, 2026
a891762
Allow catalog into constructor
charles-turner-1 Mar 23, 2026
f7396e4
Add sample config
charles-turner-1 Mar 23, 2026
0592d08
Merge branch 'main' into intake-esm
charles-turner-1 Mar 24, 2026
758f43a
Add description to intake-esm config file
charles-turner-1 Mar 24, 2026
d26a52b
Merge branch 'intake-esm' of github.com:ESMValGroup/ESMValCore into i…
charles-turner-1 Mar 24, 2026
7c71929
Merge branch 'main' into intake-esm
charles-turner-1 Mar 25, 2026
11e3fa7
Merge branch 'main' into intake-esm
charles-turner-1 Mar 31, 2026
e90ed39
- Update tests & functionality to use pangeo cloud catalog and pass t…
charles-turner-1 Mar 31, 2026
08153dd
Update data-intake-esm config.yml & quickstart config docs
charles-turner-1 Apr 1, 2026
cbbc4fb
Add configs for various locations
charles-turner-1 Apr 1, 2026
2c74ff2
update docs for intake-esm
charles-turner-1 Apr 1, 2026
543226c
Remove duplicate key
charles-turner-1 Apr 1, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ dependencies:
- fire
- geopy
- humanfriendly
- intake >=2.0.0
- intake-esm >=2025.2.3
- iris >=3.11 # 3.11 first to support Numpy 2 and Python 3.13
- iris-esmf-regrid >=0.11.0
- iris-grib >=0.20.0 # github.com/ESMValGroup/ESMValCore/issues/2535
Expand Down
74 changes: 74 additions & 0 deletions esmvalcore/config-developer.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,34 @@ CMIP6:
SYNDA: '{activity}/{institute}/{dataset}/{exp}/{ensemble}/{mip}/{short_name}/{grid}/{version}'
NCI: '{activity}/{institute}/{dataset}/{exp}/{ensemble}/{mip}/{short_name}/{grid}/{version}'
input_file: '{short_name}_{mip}_{dataset}_{exp}_{ensemble}_{grid}*.nc'
catalogs:
Comment thread
bouweandela marked this conversation as resolved.
Outdated
NCI:
- file:
/g/data/fs38/catalog/v2/esm/catalog.json
facets:
activity: activity_id
dataset: source_id
ensemble: member_id
exp: experiment_id
grid: grid_label
institute: institution_id
mip: table_id
short_name: variable_id
version: version
frequency: frequency
- file:
/g/data/oi10/catalog/v2/esm/catalog.json
facets:
activity: activity_id
dataset: source_id
ensemble: member_id
exp: experiment_id
grid: grid_label
institute: institution_id
mip: table_id
short_name: variable_id
version: version
frequency: frequency
output_file: '{project}_{dataset}_{mip}_{exp}_{ensemble}_{short_name}_{grid}'
cmor_type: 'CMIP6'

Expand All @@ -56,6 +84,36 @@ CMIP5:
SMHI: '{dataset}/{ensemble}/{exp}/{frequency}'
SYNDA: '{institute}/{dataset}/{exp}/{frequency}/{modeling_realm}/{mip}/{ensemble}/{version}'
input_file: '{short_name}_{mip}_{dataset}_{exp}_{ensemble}*.nc'
catalogs:
Β NCI:
- file:
/g/data/rr3/catalog/v2/esm/catalog.json
facets:
# mapping from recipe facets to intake-esm catalog facets
# TODO: Fix these when Gadi is back up
activity: activity_id
dataset: source_id
ensemble: ensemble
exp: experiment
grid: grid_label
institute: institution_id
mip: table_id
short_name: variable
version: version
- file:
/g/data/al33/catalog/v2/esm/catalog.json
facets:
# mapping from recipe facets to intake-esm catalog facets
# TODO: Fix these when Gadi is back up
activity: activity_id
dataset: source_id
ensemble: ensemble
exp: experiment
institute: institute
mip: table
short_name: variable
version: version
timerange: time_range
output_file: '{project}_{dataset}_{mip}_{exp}_{ensemble}_{short_name}'

CMIP3:
Expand Down Expand Up @@ -156,6 +214,22 @@ CORDEX:
ESGF: '{project.lower}/output/{domain}/{institute}/{driver}/{exp}/{ensemble}/{dataset}/{rcm_version}/{frequency}/{short_name}/{version}'
SYNDA: '{domain}/{institute}/{driver}/{exp}/{ensemble}/{dataset}/{rcm_version}/{frequency}/{short_name}/{version}'
input_file: '{short_name}_{domain}_{driver}_{exp}_{ensemble}_{institute}-{dataset}_{rcm_version}_{mip}*.nc'
catalogs:
Β NCI:
files:
- /g/data/oi10/catalog/v2/esm/catalog.json
facets:
# mapping from recipe facets to intake-esm catalog facets
# TODO: Fix these when Gadi is back up
Comment thread
bouweandela marked this conversation as resolved.
Outdated
activity: activity_id
dataset: source_id
ensemble: member_id
exp: experiment_id
grid: grid_label
institute: institution_id
mip: table_id
short_name: variable_id
version: version
output_file: '{project}_{institute}_{dataset}_{rcm_version}_{driver}_{domain}_{mip}_{exp}_{ensemble}_{short_name}'
cmor_type: 'CMIP5'
cmor_path: 'cordex'
Expand Down
5 changes: 5 additions & 0 deletions esmvalcore/intake/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"""Find files using an intake-esm catalog and load them."""

from ._dataset import IntakeDataset, load_catalogs

__all__ = ["IntakeDataset", "load_catalogs"]
164 changes: 164 additions & 0 deletions esmvalcore/intake/_dataset.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
"""Import datasets using Intake-ESM."""

import logging
from numbers import Number
from pathlib import Path
from typing import Any, Sequence

# import isodate
import intake
import intake_esm

from esmvalcore.config import CFG
from esmvalcore.config._config import get_project_config
from esmvalcore.dataset import Dataset, File
from esmvalcore.local import LocalFile

__all__ = ["IntakeDataset", "load_catalogs", "clear_catalog_cache"]

logger = logging.getLogger(__name__)

_CACHE: dict[Path, intake_esm.core.esm_datastore] = {}


def clear_catalog_cache():
"""Clear the catalog cache."""
_CACHE.clear()


def load_catalogs(
project: str, drs: dict
) -> tuple[list[intake_esm.core.esm_datastore], list[dict[str, str]]]:
"""Load all intake-esm catalogs for a project and their associated facet mappings.

Parameters
----------
project : str
The project name, eg. 'CMIP6'.
drs : dict
The DRS configuration. Can be obtained from the global configuration drs
field, eg. CFG['drs'].

Returns
-------
intake_esm.core.esm_datastore
The catalog.
dict
The facet mapping - a dictionary mapping ESMVlCore dataset facet names
to the fields in the intake-esm datastore.
"""
catalog_info: dict[str, Any] = get_project_config(project).get(
"catalogs", {}
)
site = drs.get(project, "default")
if site not in catalog_info:
return [None], [{}]

catalog_urls = [
Path(catalog.get("file")).expanduser()
for catalog in catalog_info[site]
]
facet_list = [catalog.get("facets") for catalog in catalog_info[site]]

for catalog_url in catalog_urls:
if catalog_url not in _CACHE:
logger.info(
"Loading intake-esm catalog (this may take some time): %s",
catalog_url,
)
_CACHE[catalog_url] = intake.open_esm_datastore(catalog_url)
logger.info("Successfully loaded catalog %s", catalog_url)

return ([_CACHE[cat_url] for cat_url in catalog_urls], facet_list)


class IntakeDataset(Dataset):
Comment thread
bouweandela marked this conversation as resolved.
Outdated
"""Load data using Intake-ESM."""

def __init__(self, **facets):
project = facets["project"]
self.catalog, self._facets = load_catalogs(project, CFG["drs"])
self._unmapped_facets = {}
super().__init__(**facets)

@property
def files(self) -> Sequence[File]:
if self._files is None:
self._files = self._find_files(self.facets, CFG["drs"])
return self._files

@files.setter
def files(self, value: Sequence[File]):
"""Manually set the files for the dataset."""
self._files = value

@property
def filenames(self) -> Sequence[str]:
"""String representation of the filenames in the dataset."""
return [str(f) for f in self.files]

def _find_files( # type: ignore[override]
self,
facet_map: dict[str, str | Sequence[str] | Number],
drs: dict[str, Any],
) -> Sequence[File]:
"""Find files for variable in all intake-esm catalogs associated with a project.

As a side effect, sets the unmapped_facets attribute - this is used to
cache facets which are not in the datastore.

Parameters
----------
variable : dict
A dict mapping the variable names used to initialise the IntakeDataset
object to their ESMValCore facet names. For example,
```
ACCESS_ESM1_5 = IntakeDataset(
short_name='tos',
project='CMIP6',
)
```
would result in a variable dict of {'short_name': 'tos', 'project': 'CMIP6'}.
drs : dict
The DRS configuration. Can be obtained from the global configuration drs
field, eg. CFG['drs'].
"""
if not isinstance(facet_map["project"], str):
raise TypeError(
"The project facet must be a string for Intake Datasets."
)

catalogs, facets_list = load_catalogs(facet_map["project"], drs)
if not catalogs:
return []

files = []

for catalog, facets in zip(catalogs, facets_list, strict=False):
query = {val: facet_map.get(key) for key, val in facets.items()}
query = {key: val for key, val in query.items() if val is not None}

unmapped = {
key: val for key, val in facet_map.items() if key not in facets
}
unmapped.pop("project", None)

self._unmapped_facets = unmapped

selection = catalog.search(**query)

# Select latest version
if "version" in facets and "version" not in facet_map:
latest_version = max(
selection.unique().version
) # These are strings - need to double check the sorting here.
facet_map["version"] = latest_version
query = {
facets["version"]: latest_version,
}
selection = selection.search(**query)

files += [LocalFile(f) for f in selection.unique().path]

self.augment_facets()
return files
2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@ dependencies = [
"fire",
"geopy",
"humanfriendly",
"intake>=2.0.0",
"intake-esm>=2025.2.3",
"iris-grib>=0.20.0", # github.com/ESMValGroup/ESMValCore/issues/2535
"isodate>=0.7.0",
"jinja2",
Expand Down