Skip to content

Documentation Website#151

Draft
douglowe wants to merge 6 commits into
developfrom
122-documentation-website
Draft

Documentation Website#151
douglowe wants to merge 6 commits into
developfrom
122-documentation-website

Conversation

@douglowe
Copy link
Copy Markdown
Collaborator

This will be a GitHub pages hosted website. Currently just a skeleton setup.

Copilot AI review requested due to automatic review settings May 11, 2026 12:20
@douglowe douglowe marked this pull request as draft May 11, 2026 12:20
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces an initial skeleton for a GitHub Pages–hosted documentation website under docs/website-src, using Next.js + Nextra, and adds a GitHub Actions workflow to verify the docs site builds on PRs.

Changes:

  • Added Nextra/Next.js docs site scaffold (config + basic pages).
  • Added initial site navigation metadata and app wrapper.
  • Added a PR workflow to install dependencies and build/upload the Pages artifact.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
docs/website-src/theme.config.tsx Nextra theme configuration for repo links, sidebar, header/footer.
docs/website-src/pages/index.mdx Landing page content scaffold.
docs/website-src/pages/crate_validator.mdx Placeholder page for validator documentation.
docs/website-src/pages/_meta.js Sidebar/page metadata configuration.
docs/website-src/pages/_app.jsx Next.js custom App wrapper.
docs/website-src/package.json Docs site dependencies and scripts.
docs/website-src/next.config.mjs Next.js + Nextra integration and static export settings.
.github/workflows/check.build.docs.yml CI workflow to build (and currently upload) the docs site output.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/check.build.docs.yml Outdated
Comment thread .github/workflows/check.build.docs.yml Outdated
Comment thread .github/workflows/check.build.docs.yml Outdated
import { useConfig } from 'nextra-theme-docs'

export default {
docsRepositoryBase: 'https://github.com/eScienceLab/Cratey-Validator/tree/main/docs/website',
@@ -0,0 +1,23 @@
import React from "react";
Comment on lines +15 to +17
head() {
const { frontMatter } = useConfig()

Comment on lines +1 to +3
import { Cards } from 'nextra/components'
import { Steps,Callout } from "nextra/components"

pull_request:

paths:
- "docs/website-src/**"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just have docs/ be the root? there is almost nothing in there and the stuff in docs/assets/ will be useful for the site anyway

Suggested change
- "docs/website-src/**"
- "docs/**"

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably could - I was following the deployment system for the original docs page, and trying to fit the logic into our filesystem. But, I must admit, I have followed the workflow by rote, rather than considering if we need every step.

In the deployment workflow we have these steps:

      - name: Copy built files to docs
        run: |
          rm -rf docs/website/*
          cp -r docs/website-src/out docs/website

      - name: Upload artifact
        uses: actions/upload-pages-artifact@v3
        with:
          path: docs/website

If we move to using docs rather than docs/website as the root, then perhaps we could get rid of the copy step, and upload directly from the docs/website/out (or, as it would be, docs/out) path?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that makes sense to me

@EttoreM
Copy link
Copy Markdown
Contributor

EttoreM commented May 14, 2026

This is the page added with commit fb48e50


RO-Crate Validator Service

Date: 2026-05-14

Service Snapshot

Cratey Validator is a web service that checks whether RO-Crates follow the expected structure and metadata rules. It can validate a full RO-Crate stored in MinIO-compatible object storage, or it can validate only the contents of an ro-crate-metadata.json file.

At a high level, a client sends a validation request, the service runs the RO-Crate validation checks, and the result is either returned directly or saved so it can be retrieved later.

The service is exposed through the following three endpoints:

  • POST /v1/ro_crates/{crate_id}/validation
    Validates an RO-Crate that already exists in MinIO. The service queues a background task, downloads the crate, runs the validation checks, and saves the validation result back to MinIO.

  • POST /v1/ro_crates/validate_metadata
    Validates RO-Crate metadata directly. This route is useful when the caller only needs to check the crate metadata file and does not need the service to download a full crate from object storage.

  • GET /v1/ro_crates/{crate_id}/validation
    Retrieves the saved validation result for an RO-Crate that was previously validated from MinIO.

Current Architecture

The service is built as a Flask/APIFlask web API backed by Celery workers for longer-running validation jobs. RO-Crates are read from MinIO-compatible object storage, Redis is used by Celery to pass work between the API and the worker, and validation is performed by the CRS4 rocrate_validator library.

The project is organized into a few clear areas:

Application entrypoints and routes:

  • cratey.py starts the Flask application.
  • app/__init__.py creates the APIFlask app, registers route blueprints, loads environment-specific config, and wires Celery into the Flask context.
  • app/ro_crates/routes/post_routes.py exposes validation request endpoints.
  • app/ro_crates/routes/get_routes.py exposes validation result retrieval.

Validation workflow:

  • app/services/validation_service.py performs request-level validation, object existence checks, and queues Celery tasks.
  • app/tasks/validation_tasks.py runs the actual RO-Crate and metadata validation workflows.

Storage helpers:

  • app/utils/minio_utils.py handles MinIO client setup, object discovery, download, upload, and result retrieval.

Container and development setup:

  • docker-compose.yml runs the published container image with Flask, Celery, Redis, and MinIO.
  • docker-compose-develop.yml builds the local Dockerfile and mounts a local profile directory for development.

Runtime Flow

Validate RO-Crate From MinIO

  1. Client sends a request to POST /v1/ro_crates/{crate_id}/validation.
  2. Request includes minio_config and optional root_path and profile_name.
  3. Flask route passes the request to queue_ro_crate_validation_task.
  4. Service creates a MinIO client and checks that the target crate exists.
  5. Celery queues process_validation_task_by_id.
  6. Worker downloads the RO-Crate into a temporary local path.
  7. Worker runs rocrate_validator.services.validate.
  8. Validation JSON is uploaded to MinIO at:
    • {crate_id}_validation/validation_status.txt
    • or {root_path}/{crate_id}_validation/validation_status.txt
  9. Temporary files are removed.

Retrieve Validation Result

  1. Client sends GET /v1/ro_crates/{crate_id}/validation.
  2. Request includes minio_config and optional root_path.
  3. Service checks that both the RO-Crate and validation result exist in MinIO.
  4. Stored validation JSON is returned to the client.

Validate Metadata Directly

  1. Client sends POST /v1/ro_crates/validate_metadata.
  2. Request includes crate_json and optional profile_name.
  3. Service verifies that crate_json is present, valid JSON, and non-empty.
  4. Celery runs process_validation_task_by_metadata.
  5. The API waits for the Celery result and returns the validation output synchronously.

API Surface

POST /v1/ro_crates/{crate_id}/validation

Queues validation for an RO-Crate stored in MinIO. The crate_id path parameter is the name used to find the crate object in the configured bucket.

Request body:

{
  "minio_config": {
    "endpoint": "string",   // required, e.g. "localhost:9000" or "minio:9000"
    "accesskey": "string",  // required, MinIO access key or username
    "secret": "string",     // required, MinIO secret key or password
    "ssl": false,           // required, true when the MinIO endpoint uses HTTPS
    "bucket": "string"      // required, bucket containing the RO-Crate
  },
  "root_path": "string",     // optional folder/path inside the bucket
  "profile_name": "string"   // optional validation profile name
}

Expected responses:

  • 202: validation queued.
  • 400: RO-Crate does not exist or validation request cannot be satisfied.
  • 500: internal service, MinIO, Celery, or validation error.

GET /v1/ro_crates/{crate_id}/validation

Fetches the latest validation result from MinIO.

Request body:

{
  "minio_config": {
    "endpoint": "string",   // required, e.g. "localhost:9000" or "minio:9000"
    "accesskey": "string",  // required, MinIO access key or username
    "secret": "string",     // required, MinIO secret key or password
    "ssl": false,           // required, true when the MinIO endpoint uses HTTPS
    "bucket": "string"      // required, bucket containing the RO-Crate
  },
  "root_path": "string"     // optional folder/path inside the bucket
}

Expected responses:

  • 200: validation result JSON returned.
  • 400: RO-Crate or validation result is missing.
  • 500: MinIO or internal retrieval error.

POST /v1/ro_crates/validate_metadata

Validates a submitted RO-Crate metadata JSON string.

Request body:

{
  "crate_json": "string",   // required, stringified content of ro-crate-metadata.json
  "profile_name": "string"  // optional validation profile name
}

Expected responses:

  • 200: validation result returned.
  • 422: missing, malformed, or empty metadata JSON.
  • 500: internal validation error.

Standing up the service

Local Development

The service can be started with Docker Compose:

docker compose up --build

For local container development, use:

docker compose --file docker-compose-develop.yml up --build

Expected local services:

  • Flask API: http://localhost:5001
  • MinIO API: http://localhost:9000
  • MinIO console: http://localhost:9001
  • Redis: localhost:6379

MinIO needs a bucket for RO-Crates, normally ro-crates. Bucket versioning should be enabled so uploaded crate objects can be tracked reliably.

Production Deployment

(I guess this would be the same??)

Configuration

The main environment variables are:

  • FLASK_APP: Flask entrypoint, normally cratey.py.
  • FLASK_ENV: selects development or production config.
  • CELERY_BROKER_URL: Redis broker URL.
  • CELERY_RESULT_BACKEND: Redis result backend URL.
  • PROFILES_PATH: optional path to custom RO-Crate validator profile definitions.
  • MINIO_ENDPOINT: default MinIO endpoint used by Docker examples.
  • MINIO_ROOT_USER: MinIO root username for local development.
  • MINIO_ROOT_PASSWORD: MinIO root password for local development.
  • MINIO_BUCKET_NAME: default bucket name used by local setup.

API calls also pass MinIO access details in minio_config, so the service can validate crates in a specified object store and bucket.

Test Coverage

Current tests cover the main service seams:

  • API route request validation and route-to-service wiring.
  • Validation service queueing behavior and error handling.
  • Celery task behavior for RO-Crate validation and metadata validation.
  • MinIO helper behavior.
  • Integration-level service paths.

Run tests with:

pytest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Document how to use the Five Safes RO-Crate profile with the validation service

4 participants