Skip to content

Add CI#272

Draft
a5dur wants to merge 6 commits intodathere:mainfrom
a5dur:main
Draft

Add CI#272
a5dur wants to merge 6 commits intodathere:mainfrom
a5dur:main

Conversation

@a5dur
Copy link
Copy Markdown
Contributor

@a5dur a5dur commented May 5, 2026

Infrastructure

The workflow spins up a complete CKAN stack inside the runner using GitHub Actions service containers:

Service Image
CKAN `ckan/ckan-dev:2.11` (job container)
PostgreSQL `postgres:15`
Solr `ckan/ckan-solr:2.11-solr9`
Redis `redis:3`

Test File Sourcing

Test files are sourced directly from `dathere/datapusher-plus_testing` via GitHub raw URLs — no
repository clone required. The workflow uses the GitHub Contents API to discover files in the configured test directory, then passes those raw URLs
directly to DataPusher+ for download and processing.

Only `log_analyzer.py` is fetched (single `curl` download) for post-run analysis.

How the CI Works — Step by Step

  1. System dependencies — all apt packages installed in one step (GDAL, libpq, geospatial libs, PostgreSQL client)
  2. Checkout — checks out the PR branch / commit being tested
  3. Fetch log analyzer — downloads `log_analyzer.py` from the testing repo
  4. Wait for PostgreSQL — polls until the database is accepting connections
  5. Database setup — creates CKAN users/databases; applies PostgreSQL 15 schema grants
  6. Install datapusher-plus — installs directly from the local checkout (`pip install -e .`), so the exact PR code is tested
  7. Install qsv — downloads the configured version of `qsvdp`; falls back to musl build if GNU fails
  8. Configure CKAN — patches `test-core.ini` with all DataPusher+ settings and plugin registration
  9. Initialise databases — `ckan db init` + `datastore set-permissions` + DataPusher+ migrations
  10. Start CKAN — launches the development server; waits until the API responds
  11. Create sysadmin + API tokens — creates `admin_ckan`, promotes to sysadmin, generates DataPusher+ API token
  12. Smoke-test DataStore — verifies read and write access before the full suite
  13. Start job worker — starts the CKAN background job worker (required for async processing)
  14. Run integration tests — discovers files via GitHub Contents API, creates CKAN resources, polls `datapusher_status`, checks `datastore_active`
    and row counts, records results
  15. Generate analysis — runs `log_analyzer.py` against worker logs; builds full step summary
  16. Upload artifacts — persists logs and CSVs for 7 days
  17. Cleanup — kills CKAN and worker processes by stored PID

Configurable Inputs

Input Default Description
`testing_directory` `quick` Subdirectory within `dathere/datapusher-plus_testing/tests/`
`qsv_version` `9.1.0` Version of qsv/qsvdp to install
`polling_timeout_seconds` `20` Max seconds to wait per file

Workflow Step Summary

After each run the GitHub Actions step summary includes: summary table, test run results (all files including failures with error messages), complete job
analysis from worker logs, file format and encoding distribution, error analysis, performance anomalies, and overall verdict.

Artifacts

Uploaded as `dp-ci--<run_id>` with 7-day retention: `test_results.csv`, `worker_analysis.csv`, `ckan_stdout.log`, `ckan_worker.log`.


Key Design Decisions

Install from local checkout — `pip install -e $GITHUB_WORKSPACE` ensures the CI always tests the exact branch code, not a published release.

Test files via raw GitHub URLs — no external repo clone needed; DataPusher+ downloads files directly from `dathere/datapusher-plus_testing`,
mirroring real-world URL-based ingestion.

Graceful completion — the workflow always completes all steps regardless of test outcomes so artifacts and logs are always available for debugging.

PostgreSQL 15 — `ckan/ckan-postgres-dev:2.11` was timing out on current runners; replaced with stock `postgres:15` and explicit health check
options.

Node.js 24 opt-in — `FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true` set ahead of the June 2, 2026 forced migration.


Related

🤖 Generated with Claude Code

a5dur and others added 6 commits May 5, 2026 17:05
Adds .github/workflows/ci.yml - a full DataPusher+ integration test harness
that runs on push/PR to main and nightly via cron.

Key differences from standalone testing repo workflow:
- Installs datapusher-plus from local checkout (tests the actual PR/commit)
- Clones dathere/datapusher-plus_testing for test files and log_analyzer.py
- Fails CI (exit 1) when any file fails to process through DataPusher+
- Concurrency cancel-in-progress=true per branch ref
- Artifact names include run_id to prevent collision across runs
- polling_timeout default 120s (vs 60s) for CI reliability
- Node.js 24 opt-in for actions/checkout and actions/upload-artifact

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move exit 1 out of the test step into a dedicated final gate step.
Test step now saves failure counts to GITHUB_ENV and always exits 0,
ensuring artifact upload, worker analysis, and cleanup always run.
The new "Check test results" step gates the job after all cleanup is done.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace git clone of datapusher-plus_testing with:
- GitHub Contents API to discover test files in tests/$FILES_DIR
- GitHub raw URLs served directly to DataPusher+ (no local HTTP server)
- Single curl download of log_analyzer.py

Removes: clone step, python -m http.server, HTTP_SERVER_PID management.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ckan/ckan-postgres-dev:2.11 times out during health checks on current runners.
Switch to stock postgres:15 with explicit health-cmd, retries, and start-period,
matching the fix already applied in datapusher-plus_testing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…failures

- Replace thin job details table with full summary matching check.yml:
  skipped files, complete job analysis, file formats, encoding distribution,
  error analysis, performance anomalies, overall result section
- Remove exit 1 from gate step — workflow always succeeds, results are informational

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…lts table

- POLLING_TIMEOUT default 120s -> 20s (matches testing repo behaviour)
- Guard max_attempts >= 1 to avoid divide-by-zero on very low timeouts
- Add "Test Run Results" table before Complete Job Analysis: shows ALL tested
  files (including those DataPusher never picked up) with upload status,
  DPP status, datastore active, rows, processing time, and error message

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant