Skip to content

[ENH] Simplified Unified Get/List API#1552

Open
Omswastik-11 wants to merge 19 commits intoopenml:mainfrom
Omswastik-11:prototype-api
Open

[ENH] Simplified Unified Get/List API#1552
Omswastik-11 wants to merge 19 commits intoopenml:mainfrom
Omswastik-11:prototype-api

Conversation

@Omswastik-11
Copy link
Contributor

@Omswastik-11 Omswastik-11 commented Dec 23, 2025

Redundant syntax for getters

In getters, syntax is repeated and redundant, mainly through
the submodule having to be imported or addressed.

import openml

# List all datasets and their properties
openml.datasets.list_datasets(output_format="dataframe")

# Get dataset by ID
dataset = openml.datasets.get_dataset(61)

# Get dataset by name
dataset = openml.datasets.get_dataset('Fashion-MNIST')

# This is similar for flows, runs, studies, such as

study = openml.studies.get_study(42)
flow = openml.flows.get_flows(42)

API implementation

import openml

# List all datasets
datasets_df = openml.list_all("dataset", output_format="dataframe")

# Get dataset by ID
dataset = openml.get("dataset", 61)

# Get dataset by name
dataset = openml.get("dataset", "Fashion-MNIST")

# Get task
task = openml.get("task", 31)

# Get flow
flow = openml.get("flow", 10)

# Get run
run = openml.get("run", 20)

# Shortcut: infer dataset from name when no type specified
dataset = openml.get("Fashion-MNIST")

Implementation Details

  • Added openml.list_all(object_type: str, **kwargs) -> Any, a dispatcher that forwards to:

    • list_datasets
    • list_tasks
    • list_flows
    • list_runs
  • Added openml.get(object_type_or_name, identifier=None, **kwargs) -> Any, a unified getter with support for:

    • Type-based lookup

      openml.get("dataset", 61)
      openml.get("dataset", "dataset_name")
    • Name-only shortcut for datasets

      openml.get("Fashion-MNIST")
  • Exported both functions via __all__ and documented them with docstrings.

  • Preserved full backward compatibility:

    • Existing submodule APIs (e.g., openml.datasets.get_dataset) remain unchanged.
  • Added unit tests to validate dispatcher behavior without requiring network access.

@Omswastik-11 Omswastik-11 marked this pull request as ready for review December 24, 2025 10:22
@Omswastik-11 Omswastik-11 changed the title [ENH] improved the Getter API for users [ENH] Simplified Unified Get/List API Dec 24, 2025
@Omswastik-11 Omswastik-11 requested a review from fkiraly December 25, 2025 08:30
@codecov-commenter
Copy link

codecov-commenter commented Dec 30, 2025

Codecov Report

❌ Patch coverage is 50.00000% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 52.72%. Comparing base (8a5532f) to head (13034e6).

Files with missing lines Patch % Lines
openml/dispatchers.py 48.14% 14 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1552      +/-   ##
==========================================
- Coverage   52.73%   52.72%   -0.02%     
==========================================
  Files          37       38       +1     
  Lines        4399     4427      +28     
==========================================
+ Hits         2320     2334      +14     
- Misses       2079     2093      +14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot AI review requested due to automatic review settings February 26, 2026 10:50
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a simplified unified API for getting and listing OpenML objects through two new dispatcher functions: openml.get() and openml.list_all(). These functions provide a more concise alternative to the existing submodule-specific APIs (e.g., openml.datasets.get_dataset()) while maintaining full backward compatibility. The implementation uses dispatch dictionaries to route calls to the appropriate underlying functions.

Changes:

  • Added unified get() and list_all() dispatcher functions for datasets, tasks, flows, and runs
  • Updated example tutorials to demonstrate the new API with legacy alternatives shown in comments
  • Added unit tests for the dispatcher functions using mocking

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
openml/dispatchers.py New module containing the unified get() and list_all() dispatcher functions with type checking and error handling
openml/init.py Exports the new dispatcher functions and module; contains duplicate entries in __all__ list
tests/test_openml/test_openml.py Unit tests for dispatchers covering datasets and tasks; missing coverage for flows, runs, and error conditions
examples/Basics/simple_tasks_tutorial.py Updated to demonstrate new openml.get() API with legacy alternative in comments
examples/Basics/simple_flows_and_runs_tutorial.py Updated to demonstrate both openml.list_all() and openml.get() APIs
examples/Basics/simple_datasets_tutorial.py Updated to demonstrate both openml.list_all() and openml.get() APIs
examples/Advanced/tasks_tutorial.py Updated multiple examples to demonstrate new openml.list_all() API

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Collaborator

@geetu040 geetu040 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot left a few solid review comments above, please take a look and address them.

Omswastik-11 and others added 2 commits February 27, 2026 14:17
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 27, 2026 08:48
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@fkiraly
Copy link
Collaborator

fkiraly commented Mar 6, 2026

@PGijsbers, what do you think of this suggestion?

Copy link
Collaborator

@geetu040 geetu040 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@PGijsbers could you please take a look and approve if you agree with the design?

Copy link
Collaborator

@PGijsbers PGijsbers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry to only see this so late, but I don't really see the added value for this. Please do push back if you think I am wrong in this.

I much prefer the design of #1551. I know they are technically not mutually exclusive, but I feel that PR pretty much provides the same convenience as this one in terms of removing duplication from the calls. However, unlike this PR, the #1551 method also provides:

  • (more) accurate type annotation, e.g., get_task returns a task whereas get("task", .. returns Any. This is useful for editors that give hints/autocomplete suggestions based on type annotations (common LSP functionality).
  • call-specific docstrings and parameters (e.g., task_type is for list_tasks and not other list calls), again useful for the user that they do not have to pick and choose what is the relevant documentation for their entity type.
  • less prone to typos: mistyping list_task (instead of plural) is caught by most LSPs because the function doens't exist, whereas list_all("task", ...) is likely not because its validity is delegated to runtime checks.

Granted, these could be remedied by:

  • providing different annotations and signatures with typing.overload to provide input-specific signatures and return types
  • providing an enum for allowed values

but at that point the #1551 seems easier to understand for users, and easier to maintain for us.

Unless I am mistaken, I suppose the main argument for this design would be "extensibility" in that adding a new type of object to retrieve doesn't change the function to call but only an argument. However, this requires either forgoing more detailed annotation (as outlined above). If you do want to provide the developer with good annotations then you would still need to document the new overload of the signature. That seems just as "much" work as providing it as a dedicated function.
We also do not expect to be adding or changing the entities on OpenML at such velocity that this makes a significant difference.

Copilot AI review requested due to automatic review settings March 13, 2026 13:55
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +70 to +78
def get(identifier: int | str, *, object_type: str = "dataset", **kwargs: Any) -> Any:
"""Get an OpenML object by identifier.

Parameters
----------
identifier : int | str
The ID or name of the object to retrieve. String identifiers are
supported for datasets; tasks, flows, and runs require integer IDs.
object_type : str, default="dataset"
}


def list_all(object_type: str, /, **kwargs: Any) -> Any:
Comment on lines +70 to +71
def get(identifier: int | str, *, object_type: str = "dataset", **kwargs: Any) -> Any:
"""Get an OpenML object by identifier.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants