|
| 1 | +# Workflow Best Practices |
| 2 | +:::warning 🛠️ Page Under Development |
| 3 | +Content is being actively developed and updated for this page. EarthCODE's documentation is a living document and will be continuously updated with detailed reviews. |
| 4 | +::: |
| 5 | + |
| 6 | + |
| 7 | +In this page, we describe the design decisions and best practices for creating and distributing your scientific workflows to maximize their value and impact within the EarthCODE ecosystem. A high-quality workflow is more than just code that runs; it is a complete, transparent, and robust scientific narrative, packaged in a way that is easy for others (and your future self) to understand, reuse, and reproduce. This guide provides practical recommendations based on widely accepted community standards and established software development principles. The suggested guidelines are summarized in the EarthCODE Quality Workflows checklist below. |
| 8 | + |
| 9 | +**The effort put into quality assurance for research code should be proportionate to the analysis's complexity and risk. While not every script needs production-level rigor, reproducibility is the minimum standard** |
| 10 | + |
| 11 | +When developing and publishing your workflow, consider these best practices: |
| 12 | + |
| 13 | +- [Structure Your Project Logically](./workflow-best-practices.md#structure-your-project-logically) — Organize files consistently. |
| 14 | +- [Use Version Control Effectively](./workflow-best-practices.md#use-version-control-effectively) — Track changes using Git. |
| 15 | +- [Explicitly Define the Environment](./workflow-best-practices.md#explicitly-define-the-environment) — List dependencies and versions. |
| 16 | +- [Tell a Story (in Notebooks)](./workflow-best-practices.md#tell-a-story-in-notebooks) — Explain context/methods/results in Markdown. |
| 17 | +- [Modularize and Refactor Code](./workflow-best-practices.md#modularize-and-refactor-code) — Avoid duplication; use functions/modules. |
| 18 | +- [Adopt a Consistent Coding Style](./workflow-best-practices.md#adopt-a-consistent-coding-style) — Follow style guides (e.g., PEP 8). |
| 19 | +- [Build a Reproducible Analytical Pipeline](./workflow-best-practices.md#build-a-reproducible-analytical-pipeline) — Design for automation; configure externally. |
| 20 | +- [Implement Basic Testing](./workflow-best-practices.md#implement-basic-testing) — Include basic code checks. |
| 21 | +- [Ensure Executability](./workflow-best-practices.md#ensure-executability) — Package code and environment for reuse. |
| 22 | +- [Link Code Version to Results](./workflow-best-practices.md#link-code-version-to-results) — Link code versions to results via Experiments. |
| 23 | + |
| 24 | +--- |
| 25 | +<ClientOnly> |
| 26 | + <Checklist |
| 27 | + title="Workflow Quality Assurance Checklist" |
| 28 | + :items="[ |
| 29 | + 'Use a clear, standard directory structure e.g., code, environment, docs.', |
| 30 | + 'Include a README.md explaining the project, setup, and usage.', |
| 31 | + 'Use Git for version control from the start.', |
| 32 | + 'Use .gitignore to exclude data, secrets, environment files e.g. .env, and outputs.', |
| 33 | + 'Explicitly list all software dependencies in an environment file e.g., environment.yml, requirements.txt, Dockerfile.', |
| 34 | + 'Pin key dependency versions in the environment file.', |
| 35 | + 'Follow a standard code style guide e.g., PEP 8 for Python.', |
| 36 | + 'Refactor repetitive code into functions or classes.', |
| 37 | + 'Consider moving complex/reusable code into separate modules e.g., .py files.', |
| 38 | + 'Use comments to explain the why, not the what of complex code.', |
| 39 | + 'Add docstrings to functions and classes.', |
| 40 | + 'Separate configuration parameters, paths, endpoints from code, preferably using environment variables.', |
| 41 | + 'Ensure the workflow runs non-interactively from start to finish.', |
| 42 | + 'For notebooks, regularly test with Restart Kernel and Run All Cells.', |
| 43 | + 'Include basic checks e.g., assert statements to validate data or results.', |
| 44 | + 'Document input data requirements clearly in the README.md.', |
| 45 | + 'Access data from discoverable sources e.g., cloud storage, OSC Products rather than committing data.', |
| 46 | + 'Package the workflow for execution e.g., container image, OGC Application Package.' |
| 47 | + ]" |
| 48 | + storage-key="earthcode-quality-workflow" |
| 49 | + /> |
| 50 | +</ClientOnly> |
| 51 | + |
| 52 | + |
| 53 | +## How Research Code Differs |
| 54 | + |
| 55 | +Research code often differs significantly from traditional software development. It's frequently written by domain experts, like scientists or analysts, whose main goal is to answer a specific research question, generate insights from data, or test a hypothesis. This contrasts with building a long-lasting production service. |
| 56 | + |
| 57 | +A key characteristic is its **exploratory nature**; much research code starts this way, evolving rapidly as understanding grows, which can initially lead to less structured code compared to production software. The primary focus is often on obtaining scientifically correct results and insights, sometimes taking precedence over optimal software engineering practices like extensive testing or user interface design. While some research code might be developed for a single analysis or publication, increasingly, workflows are designed for reuse and adaptation. Crucially, unlike many commercial applications, the ability for others (and the original author) to exactly **reproduce the results** from the code and data is a fundamental requirement for scientific validity. Understanding these differences helps in applying quality assurance practices appropriately. |
| 58 | + |
| 59 | + |
| 60 | +## Why Focus on Quality Research Code |
| 61 | + |
| 62 | +Although research code has unique characteristics, focusing on its quality is vital. High-quality, well-documented code is essential for others, including your future self, to trust your results. It forms the foundation for **reproducibility** – the ability to run the same analysis with the same data and get the same outcome, which is the cornerstone of scientific validation. |
| 63 | + |
| 64 | +Clean, understandable code makes it easier for peers and collaborators to review your methods, verify your implementation, and identify potential errors or improvements. Well-structured and documented code is also easier to adapt for new datasets or research questions. Investing time in quality upfront prevents "technical debt" and saves significant effort later by avoiding the need to rewrite or debug poorly written code, enabling efficient building upon previous work. |
| 65 | + |
| 66 | +Sharing high-quality code alongside data supports **transparency and open science**, allowing the broader community to understand, scrutinize, and benefit from your work. It also aligns with funding agency requirements for quality and auditability. Applying proportionate quality assurance practices, even to exploratory code, ultimately increases the reliability, impact, and longevity of your research. |
| 67 | + |
| 68 | + |
| 69 | +<!-- |
| 70 | +Key pieces of insipiration: |
| 71 | +https://arxiv.org/pdf/1810.08055 |
| 72 | +https://github.com/jupyter-guide/ten-rules-jupyter/tree/master/example1 |
| 73 | +https://github.com/jupyter-guide/jupyter-guide |
| 74 | +https://best-practice-and-impact.github.io/qa-of-code-guidance/managers_guide.html |
| 75 | +https://goodresearch.dev/ |
| 76 | +--> |
| 77 | + |
0 commit comments