Add CI by a5dur · Pull Request #272 · dathere/datapusher-plus

a5dur · 2026-05-05T13:37:15Z

Infrastructure

The workflow spins up a complete CKAN stack inside the runner using GitHub Actions service containers:

Service	Image
CKAN	`ckan/ckan-dev:2.11` (job container)
PostgreSQL	`postgres:15`
Solr	`ckan/ckan-solr:2.11-solr9`
Redis	`redis:3`

Test File Sourcing

Test files are sourced directly from `dathere/datapusher-plus_testing` via GitHub raw URLs — no
repository clone required. The workflow uses the GitHub Contents API to discover files in the configured test directory, then passes those raw URLs
directly to DataPusher+ for download and processing.

Only `log_analyzer.py` is fetched (single `curl` download) for post-run analysis.

How the CI Works — Step by Step

System dependencies — all apt packages installed in one step (GDAL, libpq, geospatial libs, PostgreSQL client)
Checkout — checks out the PR branch / commit being tested
Fetch log analyzer — downloads `log_analyzer.py` from the testing repo
Wait for PostgreSQL — polls until the database is accepting connections
Database setup — creates CKAN users/databases; applies PostgreSQL 15 schema grants
Install datapusher-plus — installs directly from the local checkout (`pip install -e .`), so the exact PR code is tested
Install qsv — downloads the configured version of `qsvdp`; falls back to musl build if GNU fails
Configure CKAN — patches `test-core.ini` with all DataPusher+ settings and plugin registration
Initialise databases — `ckan db init` + `datastore set-permissions` + DataPusher+ migrations
Start CKAN — launches the development server; waits until the API responds
Create sysadmin + API tokens — creates `admin_ckan`, promotes to sysadmin, generates DataPusher+ API token
Smoke-test DataStore — verifies read and write access before the full suite
Start job worker — starts the CKAN background job worker (required for async processing)
Run integration tests — discovers files via GitHub Contents API, creates CKAN resources, polls `datapusher_status`, checks `datastore_active`
and row counts, records results
Generate analysis — runs `log_analyzer.py` against worker logs; builds full step summary
Upload artifacts — persists logs and CSVs for 7 days
Cleanup — kills CKAN and worker processes by stored PID

Configurable Inputs

Input	Default	Description
`testing_directory`	`quick`	Subdirectory within `dathere/datapusher-plus_testing/tests/`
`qsv_version`	`9.1.0`	Version of qsv/qsvdp to install
`polling_timeout_seconds`	`20`	Max seconds to wait per file

Workflow Step Summary

After each run the GitHub Actions step summary includes: summary table, test run results (all files including failures with error messages), complete job
analysis from worker logs, file format and encoding distribution, error analysis, performance anomalies, and overall verdict.

Artifacts

Uploaded as `dp-ci--<run_id>` with 7-day retention: `test_results.csv`, `worker_analysis.csv`, `ckan_stdout.log`, `ckan_worker.log`.

Key Design Decisions

Install from local checkout — `pip install -e $GITHUB_WORKSPACE` ensures the CI always tests the exact branch code, not a published release.

Test files via raw GitHub URLs — no external repo clone needed; DataPusher+ downloads files directly from `dathere/datapusher-plus_testing`,
mirroring real-world URL-based ingestion.

Graceful completion — the workflow always completes all steps regardless of test outcomes so artifacts and logs are always available for debugging.

PostgreSQL 15 — `ckan/ckan-postgres-dev:2.11` was timing out on current runners; replaced with stock `postgres:15` and explicit health check
options.

Node.js 24 opt-in — `FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true` set ahead of the June 2, 2026 forced migration.

Adds .github/workflows/ci.yml - a full DataPusher+ integration test harness that runs on push/PR to main and nightly via cron. Key differences from standalone testing repo workflow: - Installs datapusher-plus from local checkout (tests the actual PR/commit) - Clones dathere/datapusher-plus_testing for test files and log_analyzer.py - Fails CI (exit 1) when any file fails to process through DataPusher+ - Concurrency cancel-in-progress=true per branch ref - Artifact names include run_id to prevent collision across runs - polling_timeout default 120s (vs 60s) for CI reliability - Node.js 24 opt-in for actions/checkout and actions/upload-artifact Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Move exit 1 out of the test step into a dedicated final gate step. Test step now saves failure counts to GITHUB_ENV and always exits 0, ensuring artifact upload, worker analysis, and cleanup always run. The new "Check test results" step gates the job after all cleanup is done. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replace git clone of datapusher-plus_testing with: - GitHub Contents API to discover test files in tests/$FILES_DIR - GitHub raw URLs served directly to DataPusher+ (no local HTTP server) - Single curl download of log_analyzer.py Removes: clone step, python -m http.server, HTTP_SERVER_PID management. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ckan/ckan-postgres-dev:2.11 times out during health checks on current runners. Switch to stock postgres:15 with explicit health-cmd, retries, and start-period, matching the fix already applied in datapusher-plus_testing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…failures - Replace thin job details table with full summary matching check.yml: skipped files, complete job analysis, file formats, encoding distribution, error analysis, performance anomalies, overall result section - Remove exit 1 from gate step — workflow always succeeds, results are informational Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…lts table - POLLING_TIMEOUT default 120s -> 20s (matches testing repo behaviour) - Guard max_attempts >= 1 to avoid divide-by-zero on very low timeouts - Add "Test Run Results" table before Complete Job Analysis: shows ALL tested files (including those DataPusher never picked up) with upload status, DPP status, datastore active, rows, processing time, and error message Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

a5dur and others added 6 commits May 5, 2026 17:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CI#272

Add CI#272
a5dur wants to merge 6 commits intodathere:mainfrom
a5dur:main

a5dur commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

a5dur commented May 5, 2026

Infrastructure

Test File Sourcing

How the CI Works — Step by Step

Configurable Inputs

Workflow Step Summary

Artifacts

Key Design Decisions

Related

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant