Skip to content

Commit 4b050db

Browse files
committed
adding technical integration details for platforms
1 parent 7984a03 commit 4b050db

17 files changed

Lines changed: 448 additions & 10 deletions

pages/Integrating New Platforms to EarthCODE/Technical Requirements.md

Lines changed: 0 additions & 7 deletions
This file was deleted.
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
---
2+
order: 6
3+
---
4+
5+
# Access
6+
7+
Resource discovery and access are built on a common set of open APIs (see figure below) while remaining adaptable to different platform implementations.
8+
9+
![Resource Discovery & Access](/img/integration/data_access_ecstandards.png)
10+
11+
Catalog search uses the STAC API to perform spatial, temporal and attribute‑based queries against collections and items. Feature‑level retrieval is provided by OGC API – Features (or WFS), and multi‑dimensional data can be accessed via OGC WCS. Cloud‑optimized assets (COG, Zarr) are available for direct download, and visualization layers are exposed through OGC WMS and WMTS.
12+
13+
### Product Hosting
14+
15+
Data products can be hosted on the ESA Project Results Repository (PRR) or an external repository. Each product must be described using STAC metadata, with stable asset links and appropriate media types for clients to resolve.
16+
17+
For details on the process of storing data, refer to the [Publish](./Publish.md) page.
18+
19+
### Accessing Data from the ESA Projects Results Repository
20+
21+
To see examples of accessing EarthCODE assets from the PRR please see the [relevant tutorial pages](https://esa-earthcode.github.io/tutorials/prr-stac-download-example/).
22+
23+
**Platform Integration**
24+
The Open Science Catalogue is accessible through open standard interfaces through which the Portal and Platforms integrate for discovery (mainly STAC API) and publishing - note that the integration for publishing currently happens directly via Github. Platforms can make requests to obtain items using their unique Experiment Id or unique Product Id.
25+
26+
![Open Science Interfaces](/img/integration/openscienceinterfaces.png)
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
---
2+
order: 3
3+
---
4+
5+
# Develop
6+
7+
### FAIR Open Science Platforms Detailed Requirements
8+
9+
FAIR Open Science Platforms within EarthCODE must provide comprehensive tooling and environments that support researchers across the entire workflow development lifecycle. This includes the following capabilities:
10+
11+
* **Integrated Workflow Development Tools**
12+
Platforms must offer environments for creating and editing scientific workflows, such as JupyterLab notebooks, IDEs, or dedicated graphical user interfaces (GUIs). These tools should allow researchers to seamlessly write code, manage software dependencies, and interact with data, ultimately producing formal "Experiment" definitions that link workflows to their specific inputs and configurations.
13+
14+
* **Source Code Management and Automation**
15+
To ensure versioning and reproducibility, platforms must integrate with source code management systems like Git. They should provide Continuous Integration/Continuous Deployment (CI/CD) pipelines to automate the process of:
16+
* **Building:** Packaging the workflow and its software environment into a standardized container image (e.g., Docker).
17+
* **Testing:** Running automated tests to verify the correctness of the code.
18+
* **Deploying:** Making the versioned, containerized workflow ready for execution on FAIR Infrastructure Platforms.
19+
20+
* **Workflow Validation**
21+
Before execution, platforms must perform validation checks to prevent runtime errors and ensure interoperability. This includes:
22+
* **Syntax and Semantic Validation:** Verifying that the workflow definition is valid according to its specified standard (e.g., an openEO Process Graph).
23+
* **Compatibility Checks:** Ensuring the workflow is compatible with the target FAIR Infrastructure Platform where it is intended to be run.
24+
25+
* **Reproducibility Checks**
26+
To guarantee that experiments can be reproduced, platforms must validate the inputs and configuration before a workflow is published or executed. This involves confirming that all input data references are valid and accessible, and that all configuration parameters are complete and correct for the specified workflow.
27+
28+
* **Monitoring and Observability**
29+
Researchers need clear feedback during the development process. Platforms should provide tools for real-time logging, monitoring the status of builds and tests, and observing the behavior of workflows during development runs.
30+
31+
* **Documentation and Support**
32+
Platform providers must supply comprehensive documentation covering both user-facing guides and technical integration details. This includes operational guides for scientists and clearly defined Service Level Agreements (SLAs) for the services provided.
33+
34+
35+
36+
37+
38+
Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
---
2+
order: 7
3+
---
4+
5+
# Execute
6+
7+
### Workflows and Experiments
8+
In EarthCODE, a published product is not just a dataset—it is the outcome of a defined process, captured as an **Experiment**. In addition to the definitions above, a product includes a reference to the experiment that produced it, making it possible to understand, reuse, and reproduce the results.
9+
10+
![Workflows](/img/terms/workflow-components.svg)
11+
12+
An Experiment describes everything needed to generate a product. This includes:
13+
- A human-readable description of the experiment’s purpose and context
14+
- A machine-executable workflow used to transform inputs into outputs
15+
- A definition of the **input data** used, including references to other published products
16+
- A **configuration** describing any parameters or settings used during execution
17+
18+
These components ensure that published results are reproducible. They are defined using structured metadata and described using STAC and Record Items, which together provide the semantic and technical context needed for discovery and reuse.
19+
20+
A **workflow** defines the set of processing steps used in an experiment. It is not just source code—it is something that can be formally executed within a platform using a defined interface. Workflows may take different forms, including:
21+
- openEO Process Graphs
22+
- OGC API Processes (e.g. CWL, Application Package)
23+
- MLflow models
24+
- Jupyter Notebooks
25+
26+
The source code that supports a workflow may be referenced, but the workflow itself must be described in a way that allows it to be executed by integrated EarthCODE platforms. This distinction enables reproducibility and compatibility across platforms. Platforms handle these definitions.
27+
28+
Workflows are typically stored in the **EarthCODE GitHub organization**, and referenced in the EarthCODE Open Science Catalog as part of the workflow metadata.
29+
30+
Experiments also declare the **input datasets** used and a **configuration** that defines any parameters passed to the workflow at runtime. Inputs are referenced using unique identifiers, making it easier to validate and re-run experiments with the same data. Configuration values are usually a set of simple name–value pairs, but can vary depending on workflow complexity.
31+
32+
---
33+
34+
In summary, EarthCODE combines the concepts of workflows and products are combined. A **product** is the result of a successfully run experiment. The product metadata links back to the experiment metadata, which in turn references the workflow, input, and config. Together, this structure ensures reproducibility, FAIRness and Openness.
35+
36+
![all together](/img/terms/metadata-components.svg)
37+
38+
39+
### Executing Experiments
40+
41+
EarthCODE integrates compute capabilities through standard interfaces that support a variety of processing paradigms. Experiments can be executed as containerized application packages, data‑cube workflows, machine learning inference services or full model training runs—all discoverable and shareable under FAIR principles. Platforms are expected to provide an interface for requesting the execution of one or many of the following.
42+
43+
![alt text](/img/integration/processes_and_analytics_integration.png)
44+
45+
**OGC Application Packages** ([Best Practice for EOAP](https://docs.ogc.org/bp/20-089r1.html))
46+
General purpose algorithms and workflows are delivered as OGC Application Packages. Each package bundles code, dependencies and a defined entry point, and can be deployed on any platform supporting [OGC API – Processes](https://ogcapi.ogc.org/processes/).
47+
48+
**openEO Processing Graphs** ([openEO API process graphs](https://api.openeo.org/v/0.3.0/processgraphs/))
49+
Data cube analytics leverage the openEO API, which accepts a processing graph describing operations on multidimensional datacubes. Platforms that implement the openEO specification execute these graphs close to the hosted data, enabling scalable, server‑side computation.
50+
51+
**Machine Learning Model Execution**
52+
Inference workloads can be exposed via OGC API Processes or served through Seldon Core [Seldon Core](https://docs.seldon.io/projects/seldon-core/en/latest/). Models packaged as reusable services respond to standardized process calls, allowing EarthCODE experiments to incorporate AI inference without bespoke integration effort.
53+
54+
**Machine Learning Training**
55+
Model development and training runs are managed with ([MLflow](https://mlflow.org/)), which tracks experiments, parameters, metrics and artifacts. By invoking MLflow’s REST API within workflows, platforms provide end‑to‑end support for training, versioning and deployment of machine learning models.
56+
57+
---
58+
59+
To better understand what is expected for making submissions to the catalog, it helps to look at the following examples for Experiments and Products.
60+
61+
:::details Examples of an Experiment Submission to the Open Science Catalog
62+
63+
https://opensciencedata.esa.int/workflows/worldcereal-workflow/record
64+
65+
https://github.com/ESA-EarthCODE/open-science-catalog-metadata/blob/main/experiments/worldcereal-experiment/record.json
66+
:::
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
---
2+
order: 5
3+
---
4+
5+
# Find
6+
7+
Discoverability in EarthCODE is provided by the Open Science Catalog, which exposes all published resources through a suite of open, standardized APIs for both human and machine access.
8+
9+
### Discovery and Access
10+
Catalog search uses the STAC API to perform spatial, temporal and attribute‑based queries against collections and items. Feature‑level retrieval is provided by OGC API – Features (or WFS), and multi‑dimensional data can be accessed via OGC WCS. Visualization layers may be exposed through OGC WMS and WMTS.
11+
12+
![Resource Discovery & Access](/img/integration/data_access_ecstandards.png)
13+
14+
#### Core Discovery via STAC API
15+
16+
The primary programmatic interface for searching the catalog is the **STAC API**. This is the main endpoint for machine-to-machine queries. It enables complex spatial, temporal, and attribute-based queries to find relevant data products and experiments. The effectiveness of these searches relies on the rich, standardized metadata described in the [Metadata Definitions](./Metadata-Definitions.md) page.
Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
---
2+
order: 2
3+
---
4+
5+
# Metadata Definitions
6+
7+
### Data
8+
In EarthCODE, the final outputs of your research—referred to as Products or data products—are stored, described, and published in a way that ensures long-term FAIRness and availability.
9+
10+
Each product is described using STAC (SpatioTemporal Asset Catalog) metadata through Collections that capture key attributes like the spatial and temporal extent, scientific context, provenance, and more.
11+
12+
With common dictionary and unified metadata across heterogeneous sources, product discovery is facilitated. Published Products are associated to Projects and additional information is provided by tags. Themes and Variables are used as tags to facilitate Products discovery. Earth Observation Satellite Missions used to generate the Product are provided by EO‑Mission description in Product metadata.
13+
14+
![metadata-stac](https://github.com/EOEPCA/open-science-catalog-metadata/assets/120453810/71b8e8a7-9a86-491b-ae54-1fb4de9ccf32)
15+
16+
Platform Integration
17+
- The formal definition and requirements of the schema can be found at https://github.com/stac-extensions/osc.
18+
- EarthCODE provides a validation library: https://github.com/ESA-EarthCODE/open-science-catalog-validation
19+
- Guide for manual submissions (platforms automate these steps): [Contributing to the Open Science Catalog](../../Technical%20Documentation/Open%20Science%20Catalog/Contributing%20to%20the%20Open%20Science%20Catalog.md)
20+
21+
### Workflows and Experiments
22+
In EarthCODE, a published product is the outcome of a defined process, captured as an Experiment. A product includes a reference to the experiment that produced it.
23+
24+
An Experiment describes everything needed to generate a product, including:
25+
- A human‑readable description of the experiment’s purpose and context
26+
- A machine‑executable workflow used to transform inputs into outputs
27+
- A definition of the input data used, including references to other published products
28+
- A configuration describing any parameters or settings used during execution
29+
30+
Workflows may take different forms, including openEO Process Graphs, OGC API Processes (e.g. CWL, Application Package), MLflow models, and Jupyter Notebooks. The source code may be referenced, but the workflow must be described in a way that allows execution by integrated EarthCODE platforms.
31+
32+
Workflows are typically stored in the EarthCODE GitHub organization and referenced in the Open Science Catalog. Experiments also declare the input datasets and configuration values used at runtime.
33+
34+
### Validate Products and Experiments
35+
Before any publication to the Open Science Catalog, products and experiment definitions must pass validation using the open‑science‑catalog‑validation package. See https://github.com/ESA-EarthCODE/open-science-catalog-validation for up‑to‑date documentation.
36+
37+
38+
---
39+
40+
41+
42+
# Metadata Definitions
43+
44+
EarthCODE uses a linked-metadata approach to ensure all scientific outputs are FAIR. This relies on two core, community-driven standards: **STAC** for describing data products and **OGC API - Records** for describing the workflows and experiments that create them. Integrated platforms are responsible for generating metadata that is compliant with these standards and the specific EarthCODE profiles.
45+
46+
### Data Products (STAC)
47+
48+
In EarthCODE, the final outputs of your research—referred to as **Products**—are described and published in a way that ensures long-term FAIRness and availability.
49+
50+
Each product must be described using a **STAC (SpatioTemporal Asset Catalog) Collection**. This metadata captures key attributes like the spatial and temporal extent, scientific context, licensing, and provenance, making the data discoverable and understandable.
51+
52+
To facilitate discovery across diverse datasets, product metadata is enriched using a common dictionary of tags, including `Projects`, `Themes`, `Variables`, and the `EO-Mission` used to generate the data.
53+
54+
![STAC Metadata Structure](https://github.com/EOEPCA/open-science-catalog-metadata/assets/120453810/71b8e8a7-9a86-491b-ae54-1fb4de9ccf32)
55+
56+
#### Platform Requirements for Data Products
57+
- **Schema Compliance:** Platforms must generate STAC Collection JSON that is compliant with the EarthCODE definitions and terms, which builds upon the Open Science Catalog (OSC) STAC extension. The formal definition can be found at: [https://github.com/stac-extensions/osc](https://github.com/stac-extensions/osc).
58+
- **Validation:** All generated STAC metadata **must** pass validation using the official EarthCODE validation library before it can be published. See: [https://github.com/ESA-EarthCODE/open-science-catalog-validation](https://github.com/ESA-EarthCODE/open-science-catalog-validation).
59+
- **Manual Submissions:** While platforms are expected to automate these steps, a guide for manual submissions is available here: [Contributing to the Open Science Catalog](../../Technical%20Documentation/Open%20Science%20Catalog/Contributing%20to%20the%20Open%20Science%20Catalog.md).
60+
61+
### Workflows and Experiments (OGC API - Records)
62+
63+
In EarthCODE, a published product is not just a dataset—it is the outcome of a defined and reproducible process, captured as an **Experiment**. The product's metadata must include a reference to the experiment that produced it.
64+
65+
An Experiment record describes everything needed to regenerate a product, including:
66+
- A human-readable description of its purpose and context.
67+
- A link to a machine-executable **Workflow** used to transform inputs into outputs.
68+
- A definition of the **Input Data** used, including references to other published products.
69+
- The **Configuration** describing parameters or settings used during execution.
70+
71+
Workflows may take different forms, such as openEO Process Graphs, OGC API Processes (e.g., CWL, Application Package), MLflow models, or Jupyter Notebooks. The source code may be referenced, but the workflow itself must be described in a way that allows it to be formally executed by an integrated EarthCODE platform.
72+
73+
#### Platform Requirements for Workflows and Experiments
74+
- **Schema Compliance:** Platforms must generate metadata for workflows and experiments that is compliant with the **OGC API - Records** standard and the specific EarthCODE profiles.
75+
- **Validation:** As with data products, all workflow and experiment metadata **must** pass validation using the [`open-science-catalog-validation`](https://github.com/ESA-EarthCODE/open-science-catalog-validation) package before publication.
76+
- **Reproducibility:** The generated metadata must contain all necessary information to re-run the experiment, including links to versioned code, container images, input data identifiers, and configuration parameters.

0 commit comments

Comments
 (0)