Skip to content

Match all facet values when using FacetFilter with keep=False#209

Merged
bouweandela merged 3 commits intoClimate-REF:mainfrom
bouweandela:improve-facetfilter-exclude
Mar 27, 2025
Merged

Match all facet values when using FacetFilter with keep=False#209
bouweandela merged 3 commits intoClimate-REF:mainfrom
bouweandela:improve-facetfilter-exclude

Conversation

@bouweandela
Copy link
Copy Markdown
Contributor

@bouweandela bouweandela commented Mar 27, 2025

Description

For the TCRE metric #208, I need variables "tas" and "fco2antt" from the "esm-1pctCO2" experiment and variable "tas" from the "esm-piControl" experiment. Therefore I wrote the following filters:

FacetFilter(
    facets={
        "variable_id": ("tas", "fco2antt"),
        "frequency": "mon",
        "experiment_id": ("esm-1pctCO2", "esm-piControl"),
    },
),
FacetFilter(
    facets={
        "variable_id": "fco2antt",
        "experiment_id": "esm-piControl",
    },
    keep=False,
),

The first filter selects all files with "variable_id" "tas" and "fco2antt" from both experiments and the second filter then excludes files with "variable_id" "fco2antt" from the "esm-piControl" experiment.

The implementation here makes the above filters work as expected.

Checklist

Please confirm that this pull request has done the following:

  • Tests added
  • Documentation added (where applicable)
  • Changelog item added to changelog/

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 27, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@bouweandela bouweandela force-pushed the improve-facetfilter-exclude branch from ad67433 to d60ea50 Compare March 27, 2025 09:52
@bouweandela bouweandela marked this pull request as ready for review March 27, 2025 09:55
@bouweandela bouweandela requested a review from mikapfl March 27, 2025 09:55
Copy link
Copy Markdown
Contributor

@mikapfl mikapfl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm, thanks.

It still feels to me like we should be able to just use pandas_indexings filters instead of discovering problems like these on our own, but that's a topic for another day.

@bouweandela
Copy link
Copy Markdown
Contributor Author

I'm not very familiar with pandas, could you explain that in a bit more detail? I had another look at the pandas indexing documentation and found a simpler solution that uses a bit more of pandas in 452abf5.

@mikapfl
Copy link
Copy Markdown
Contributor

mikapfl commented Mar 27, 2025

Ah, sorry, I was talking about https://pandas-indexing.readthedocs.io/en/latest/ , where we can write stuff like

filter = (
    isin(variable_id=["tas", "fco2antt"], frequency="mon", experiment_id=["esm-1pctCO2", "esm-piControl"])
    & ~isin(variable="fco2antt", experiment_id="esm-piControl")
)
data_catalog = data_catalog.loc[filter]

which I think is at least equally readable compared to our custom FacetFilter class and syntax, but someone else already built it for us. However, it is of course another dependency, and we already have FacetFilters and they work and we don't want to refactor everything. So, sorry for the distraction.

@bouweandela bouweandela merged commit 5b5f21f into Climate-REF:main Mar 27, 2025
14 checks passed
lewisjared added a commit that referenced this pull request Mar 28, 2025
* origin/main: (89 commits)
  Do not list duplicate entries from the CLI (#210)
  Add changelog
  Match all facet values when using `FacetFilter` with `keep=False` (#209)
  Disable pushing container images from forks (#207)
  Nicer error message when specifying wrong columns in `ref datasets list` (#203)
  add changelog and minor changes
  chore: Reenable skip on push
  chore: Fix integration test
  chore: Remove slow CI job
  chore: Fix output directory calculation in ilamb
  chore: Add python version to the names
  fix: Sanitise directory names
  test: Fetch ILAMB data
  test: Use a non-hidden directory
  test: Track the output directories as artifacts
  chore: Remove unused comment
  docs: changelog
  chore: changelog
  test: Add test coverage
  feat: Add a timeout parameter to the solve command
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants