[ENH] Simplified Unified Get/List API by Omswastik-11 · Pull Request #1552 · openml/openml-python

Omswastik-11 · 2025-12-23T18:49:28Z

Redundant syntax for getters

In getters, syntax is repeated and redundant, mainly through
the submodule having to be imported or addressed.

import openml

# List all datasets and their properties
openml.datasets.list_datasets(output_format="dataframe")

# Get dataset by ID
dataset = openml.datasets.get_dataset(61)

# Get dataset by name
dataset = openml.datasets.get_dataset('Fashion-MNIST')

# This is similar for flows, runs, studies, such as

study = openml.studies.get_study(42)
flow = openml.flows.get_flows(42)

API implementation

import openml

# List all datasets
datasets_df = openml.list_all("dataset", output_format="dataframe")

# Get dataset by ID
dataset = openml.get("dataset", 61)

# Get dataset by name
dataset = openml.get("dataset", "Fashion-MNIST")

# Get task
task = openml.get("task", 31)

# Get flow
flow = openml.get("flow", 10)

# Get run
run = openml.get("run", 20)

# Shortcut: infer dataset from name when no type specified
dataset = openml.get("Fashion-MNIST")

Implementation Details

Added openml.list_all(object_type: str, **kwargs) -> Any, a dispatcher that forwards to:
- list_datasets
- list_tasks
- list_flows
- list_runs
Added openml.get(object_type_or_name, identifier=None, **kwargs) -> Any, a unified getter with support for:
- Type-based lookup
```
openml.get("dataset", 61)
openml.get("dataset", "dataset_name")
```
- Name-only shortcut for datasets
```
openml.get("Fashion-MNIST")
```
Exported both functions via __all__ and documented them with docstrings.
Preserved full backward compatibility:
- Existing submodule APIs (e.g., openml.datasets.get_dataset) remain unchanged.
Added unit tests to validate dispatcher behavior without requiring network access.

openml/__init__.py

codecov-commenter · 2025-12-30T22:08:13Z

Codecov Report

❌ Patch coverage is 50.00000% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 52.72%. Comparing base (8a5532f) to head (13034e6).

Files with missing lines	Patch %	Lines
openml/dispatchers.py	48.14%	14 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1552      +/-   ##
==========================================
- Coverage   52.73%   52.72%   -0.02%     
==========================================
  Files          37       38       +1     
  Lines        4399     4427      +28     
==========================================
+ Hits         2320     2334      +14     
- Misses       2079     2093      +14

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…l-python into prototype-api

openml/__init__.py

for more information, see https://pre-commit.ci

Copilot

Pull request overview

This PR introduces a simplified unified API for getting and listing OpenML objects through two new dispatcher functions: openml.get() and openml.list_all(). These functions provide a more concise alternative to the existing submodule-specific APIs (e.g., openml.datasets.get_dataset()) while maintaining full backward compatibility. The implementation uses dispatch dictionaries to route calls to the appropriate underlying functions.

Changes:

Added unified get() and list_all() dispatcher functions for datasets, tasks, flows, and runs
Updated example tutorials to demonstrate the new API with legacy alternatives shown in comments
Added unit tests for the dispatcher functions using mocking

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
openml/dispatchers.py	New module containing the unified `get()` and `list_all()` dispatcher functions with type checking and error handling
openml/init.py	Exports the new dispatcher functions and module; contains duplicate entries in `__all__` list
tests/test_openml/test_openml.py	Unit tests for dispatchers covering datasets and tasks; missing coverage for flows, runs, and error conditions
examples/Basics/simple_tasks_tutorial.py	Updated to demonstrate new `openml.get()` API with legacy alternative in comments
examples/Basics/simple_flows_and_runs_tutorial.py	Updated to demonstrate both `openml.list_all()` and `openml.get()` APIs
examples/Basics/simple_datasets_tutorial.py	Updated to demonstrate both `openml.list_all()` and `openml.get()` APIs
examples/Advanced/tasks_tutorial.py	Updated multiple examples to demonstrate new `openml.list_all()` API

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

openml/dispatchers.py

openml/__init__.py

openml/dispatchers.py

tests/test_openml/test_openml.py

openml/dispatchers.py

tests/test_openml/test_openml.py

openml/dispatchers.py

geetu040

Copilot left a few solid review comments above, please take a look and address them.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/test_openml/test_openml.py

openml/dispatchers.py

fkiraly · 2026-03-06T14:08:12Z

@PGijsbers, what do you think of this suggestion?

geetu040

LGTM.

@PGijsbers could you please take a look and approve if you agree with the design?

openml/dispatchers.py

PGijsbers

I'm sorry to only see this so late, but I don't really see the added value for this. Please do push back if you think I am wrong in this.

I much prefer the design of #1551. I know they are technically not mutually exclusive, but I feel that PR pretty much provides the same convenience as this one in terms of removing duplication from the calls. However, unlike this PR, the #1551 method also provides:

(more) accurate type annotation, e.g., get_task returns a task whereas get("task", .. returns Any. This is useful for editors that give hints/autocomplete suggestions based on type annotations (common LSP functionality).
call-specific docstrings and parameters (e.g., task_type is for list_tasks and not other list calls), again useful for the user that they do not have to pick and choose what is the relevant documentation for their entity type.
less prone to typos: mistyping list_task (instead of plural) is caught by most LSPs because the function doens't exist, whereas list_all("task", ...) is likely not because its validity is delegated to runtime checks.

Granted, these could be remedied by:

providing different annotations and signatures with typing.overload to provide input-specific signatures and return types
providing an enum for allowed values

but at that point the #1551 seems easier to understand for users, and easier to maintain for us.

Unless I am mistaken, I suppose the main argument for this design would be "extensibility" in that adding a new type of object to retrieve doesn't change the function to call but only an argument. However, this requires either forgoing more detailed annotation (as outlined above). If you do want to provide the developer with good annotations then you would still need to document the new overload of the signature. That seems just as "much" work as providing it as a dedicated function.
We also do not expect to be adding or changing the entities on OpenML at such velocity that this makes a significant difference.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

openml/dispatchers.py

+def get(identifier: int | str, *, object_type: str = "dataset", **kwargs: Any) -> Any:
+    """Get an OpenML object by identifier.
+
+    Parameters
+    ----------
+    identifier : int | str
+        The ID or name of the object to retrieve. String identifiers are
+        supported for datasets; tasks, flows, and runs require integer IDs.
+    object_type : str, default="dataset"


openml/dispatchers.py

+}
+
+
+def list_all(object_type: str, /, **kwargs: Any) -> Any:


openml/dispatchers.py

+def get(identifier: int | str, *, object_type: str = "dataset", **kwargs: Any) -> Any:
+    """Get an OpenML object by identifier.


improved the Getter API for users

bb4554f

Omswastik-11 marked this pull request as ready for review December 24, 2025 10:22

Omswastik-11 changed the title ~~[ENH] improved the Getter API for users~~ [ENH] Simplified Unified Get/List API Dec 24, 2025