Add CI#272
Draft
a5dur wants to merge 6 commits intodathere:mainfrom
Draft
Conversation
Adds .github/workflows/ci.yml - a full DataPusher+ integration test harness that runs on push/PR to main and nightly via cron. Key differences from standalone testing repo workflow: - Installs datapusher-plus from local checkout (tests the actual PR/commit) - Clones dathere/datapusher-plus_testing for test files and log_analyzer.py - Fails CI (exit 1) when any file fails to process through DataPusher+ - Concurrency cancel-in-progress=true per branch ref - Artifact names include run_id to prevent collision across runs - polling_timeout default 120s (vs 60s) for CI reliability - Node.js 24 opt-in for actions/checkout and actions/upload-artifact Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move exit 1 out of the test step into a dedicated final gate step. Test step now saves failure counts to GITHUB_ENV and always exits 0, ensuring artifact upload, worker analysis, and cleanup always run. The new "Check test results" step gates the job after all cleanup is done. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace git clone of datapusher-plus_testing with: - GitHub Contents API to discover test files in tests/$FILES_DIR - GitHub raw URLs served directly to DataPusher+ (no local HTTP server) - Single curl download of log_analyzer.py Removes: clone step, python -m http.server, HTTP_SERVER_PID management. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ckan/ckan-postgres-dev:2.11 times out during health checks on current runners. Switch to stock postgres:15 with explicit health-cmd, retries, and start-period, matching the fix already applied in datapusher-plus_testing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…failures - Replace thin job details table with full summary matching check.yml: skipped files, complete job analysis, file formats, encoding distribution, error analysis, performance anomalies, overall result section - Remove exit 1 from gate step — workflow always succeeds, results are informational Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…lts table - POLLING_TIMEOUT default 120s -> 20s (matches testing repo behaviour) - Guard max_attempts >= 1 to avoid divide-by-zero on very low timeouts - Add "Test Run Results" table before Complete Job Analysis: shows ALL tested files (including those DataPusher never picked up) with upload status, DPP status, datastore active, rows, processing time, and error message Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Infrastructure
The workflow spins up a complete CKAN stack inside the runner using GitHub Actions service containers:
Test File Sourcing
Test files are sourced directly from `dathere/datapusher-plus_testing` via GitHub raw URLs — no
repository clone required. The workflow uses the GitHub Contents API to discover files in the configured test directory, then passes those raw URLs
directly to DataPusher+ for download and processing.
Only `log_analyzer.py` is fetched (single `curl` download) for post-run analysis.
How the CI Works — Step by Step
and row counts, records results
Configurable Inputs
Workflow Step Summary
After each run the GitHub Actions step summary includes: summary table, test run results (all files including failures with error messages), complete job
analysis from worker logs, file format and encoding distribution, error analysis, performance anomalies, and overall verdict.
Artifacts
Uploaded as `dp-ci--<run_id>` with 7-day retention: `test_results.csv`, `worker_analysis.csv`, `ckan_stdout.log`, `ckan_worker.log`.
Key Design Decisions
Install from local checkout — `pip install -e $GITHUB_WORKSPACE` ensures the CI always tests the exact branch code, not a published release.
Test files via raw GitHub URLs — no external repo clone needed; DataPusher+ downloads files directly from `dathere/datapusher-plus_testing`,
mirroring real-world URL-based ingestion.
Graceful completion — the workflow always completes all steps regardless of test outcomes so artifacts and logs are always available for debugging.
PostgreSQL 15 — `ckan/ckan-postgres-dev:2.11` was timing out on current runners; replaced with stock `postgres:15` and explicit health check
options.
Node.js 24 opt-in — `FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true` set ahead of the June 2, 2026 forced migration.
Related
🤖 Generated with Claude Code