-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Problem
The Run Pipeline GitHub Actions workflow (.github/workflows/pipeline.yaml) fails every time it runs on main. The pipeline is killed ~10 minutes into execution when the GitHub Actions runner timeout fires.
Root cause
The workflow uses modal run --detach to launch the pipeline:
timeout-minutes: 10
...
run: |
modal run --detach modal_app/pipeline.py::main $ARGS--detach does not work the way we assumed. Specifically:
--detachdoes not make the CLI return immediately after launching the app- The CLI stays connected and streams logs to stdout (we observed Phase 1, Phase 2, and Phase 3 output in the GitHub Actions log)
--detachonly means "if the CLI process dies or disconnects, keep the Modal app running"- However, when GitHub Actions fires its timeout, it sends an active cancellation signal to the step — this is different from the CLI simply disconnecting
- The Modal CLI interprets this cancellation and stops the remote app, despite
--detach
The pipeline takes 10+ hours end-to-end (dataset build, calibration, regional H5s, national H5s, staging, diagnostics). GitHub Actions has a hard maximum of 6 hours per job, so even bumping the timeout cannot solve this.
What we observed
From the GitHub Actions log of run #367 (workflow run 23567657966):
[Step 1/5] Building datasets...
=== Phase 1: Building independent datasets (parallel) ===
...
=== Phase 2: Building CPS and PUF (parallel) ===
...
=== Phase 3: Building extended CPS ===
Starting policyengine_us_data/datasets/cps/extended_cps.py...
Target variable 'estate_income_would_be_qualified' has constant value True...
Error: The operation was canceled.
The runner timed out at 10m14s. The Modal app was actively running and doing real work, but was killed by the cancellation.
Why other approaches don't work
| Approach | Problem |
|---|---|
timeout-minutes: 360 (6h max) |
Pipeline takes 10+ hours |
nohup modal run --detach ... & |
Race condition: if the runner VM is torn down before the image build completes, the Modal app never launches. Also untested whether --detach actually survives process cleanup in GHA. |
modal run --detach (current) |
CLI streams logs and dies on timeout cancellation |
Fix: modal deploy + .spawn()
PR #635 implements this fix. The approach:
-
modal deploy modal_app/pipeline.py— Registers the app as a persistent deployment on Modal. This builds the image, uploads code, and makes all@app.function()functions callable remotely. The deploy step takes 1-3 minutes. -
run_pipeline.spawn(branch='main', ...)— Calls the deployedrun_pipelinefunction as a fire-and-forget..spawn()submits the job to Modal and returns immediately with aFunctionCallhandle. The GitHub Actions step exits in seconds. -
The pipeline runs on Modal infrastructure for as long as it needs (up to
run_pipeline'stimeout=86400= 24 hours), completely untethered from the GitHub runner.
- name: Deploy and launch pipeline on Modal
run: |
modal deploy modal_app/pipeline.py
python -c "
import modal
run_pipeline = modal.Function.from_name('policyengine-us-data-pipeline', 'run_pipeline')
run_pipeline.spawn(branch='main', ...)
print('Pipeline spawned. Monitor on the Modal dashboard.')
"Key Modal concepts for future reference
modal runcreates an ephemeral app — it exists only while the CLI is connected.--detachmakes it survive a disconnect but not an active cancellation.modal deploycreates a persistent app — it stays registered on Modal until explicitly stopped. Functions can be invoked from anywhere via.spawn(),.remote(), or web endpoints..spawn()is Modal's fire-and-forget mechanism — it submits a function call and returns aFunctionCallhandle immediately without waiting for completion..remote()calls a function and blocks until it returns — equivalent to whatmodal rundoes under the hood.
Timeline
- Pipeline workflow added in commit
4991b1e(PR Harden Modal pipeline: pre-baked images, auto-trigger on merge, at-large CD fix #611, merged 2026-03-25) - First run on main: workflow run
23567657966— failed at 10m14s - Fix: PR Use modal deploy + spawn to decouple pipeline from GitHub runner #635