ci: consolidate self-hosted build; wire pytest-xdist (off by default)#901
Open
thomasrockhu-codecov wants to merge 3 commits into
Open
ci: consolidate self-hosted build; wire pytest-xdist (off by default)#901thomasrockhu-codecov wants to merge 3 commits into
thomasrockhu-codecov wants to merge 3 commits into
Conversation
…ted build
Two ci-router runtime improvements (items 1 and 3+4 from the speedup plan):
1. Parallelize API and Shared test jobs with pytest-xdist (-n auto)
The API test job is the longest single job on the critical path
(~6 minutes). Both API and Shared tests are pytest-django suites whose
per-worker test DB suffixing is supported out of the box, so we can opt
into pytest-xdist with minimal risk. Worker stays serial because its
SQLAlchemy DB setup in apps/worker/conftest.py uses the deprecated
`slaveinput` API and mutates `engine.url` (immutable in modern
SQLAlchemy), which would race on `test_postgres_sqlalchemy` under xdist.
Wiring:
- Add pytest-xdist to the dev dependency group.
- New `PYTEST_XDIST` variable in docker/Makefile.ci-tests; when set, the
test recipes append `-n <value>`. Empty (default) keeps tests serial.
- New `pytest_xdist` input on _run-tests.yml that forwards to the make
target.
- api-ci.yml and shared-ci.yml set `pytest_xdist: auto`; worker-ci.yml
keeps it empty with an inline note explaining why.
2. Consolidate self-hosted build/push so we only build once per run
Previously `api-self-hosted` (and `worker-self-hosted`) ran on push to
main and called _self-hosted.yml a second time, just to push a rolling
tag whose image had already been built moments earlier by
`api-build-self-hosted`. The second call would cache-hit but still
spun up a fresh runner, restored cache, and `docker load`d the tar
(~50-90s of pure overhead).
Replace the two-job pattern with a single `*-build-self-hosted` job
that always builds and conditionally sets `push_rolling` based on the
same push-to-main condition. The inner _self-hosted.yml already gates
its push job on `inputs.push_rolling == true`.
Net effect: removes ~50-90s from push-to-main critical path and one
redundant runner per app per main push.
Contributor
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #901 +/- ##
==========================================
- Coverage 91.91% 91.91% -0.01%
==========================================
Files 1316 1316
Lines 50191 50236 +45
Branches 1625 1625
==========================================
+ Hits 46133 46174 +41
- Misses 3752 3756 +4
Partials 306 306
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
…xdist
CI on the parent commit revealed two distinct xdist failures:
1. API: `test_when_repository_has_null_head_has_parent_report` and a few
sibling bundle-analysis tests in `test_pull.py` / `test_commit.py`
failed with `sqlite3.OperationalError: no such table: bundles`. The
tests `os.system("rm -rf /tmp/bundle_analysis_*")` and assert via
`os.listdir("/tmp")` that no `bundle_analysis_*` files remain. Under
xdist those operations clobber other workers' temp SQLite files
created by `tempfile.mkstemp(prefix="bundle_analysis_")` in
`shared/bundle_analysis/report.py`.
Fix: in apps/codecov-api/conftest.py, on `pytest_configure`, point
each xdist worker at its own `TMPDIR` (`<tmp>/pytest_<worker_id>`)
and update the four affected tests to use `tempfile.gettempdir()`
instead of hard-coded `/tmp`. Behaviorally a no-op when running
serially (PYTEST_XDIST_WORKER unset).
2. Shared: enabling xdist exposed an unrelated race in shared's custom
`django_setup_test_db` where multiple workers collide on
`test_postgres_gw0` ("already exists" / "is being accessed by other
users"). That's a separate plumbing change in libs/shared; back the
shared workflow down to `pytest_xdist: ""` for now with an inline
note.
API stays on `pytest_xdist: auto` and is the headline win
(measured ~360s -> 218s on the failing run before this fix).
… the win The API xdist run uncovered a second, unrelated test isolation issue in `upload/tests/views/test_uploads.py`: 7 parametrized variants of `test_uploads_post_tokenless` and `test_uploads_post_token_required_auth_check` fail with `assert list(upload.flags.all()) == [flag1, flag2]` returning only `[flag2]` even though both `UploadFlagMembership` rows are present in the DB for that upload. That looks like Django M2M-related caching or a parametrize-ordering effect that's worth fixing in a focused follow-up rather than gating this CI optimization PR on it. Setting `pytest_xdist: ""` for API turns it back to serial. The wiring (input + Makefile flag + pytest-xdist dependency + bundle_analysis tmpdir isolation in the API conftest) all stays in the PR; flipping API back on xdist is a one-line change once the upload-tests issue is fixed. The headline win in this PR is now the self-hosted job consolidation (items 3+4): one build per app per run, push only on main.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two CI improvements:
1. Consolidate self-hosted build/push so we build once per run (items 3+4 from the speedup plan)
On push-to-main, both
api-build-self-hostedandapi-self-hostedran. The second job called_self-hosted.ymlagain purely to push the rolling tag — it cache-hit on the build but still spun up a fresh runner, restored cache, anddocker loadd the tar (~50-90s of pure overhead per app).This PR collapses each pair into a single
*-build-self-hostedjob that always builds and conditionally pushes by computingpush_rollingfrom the same push-to-main condition. The inner_self-hosted.ymlalready gates its push job oninputs.push_rolling == true, so no changes were needed there.api-self-hostedandworker-self-hostedtop-level jobs inapi-ci.ymlandworker-ci.yml.API CI / Build Self Hosted (API) / Build Self Hosted App, etc.) are preserved.2. Wire pytest-xdist plumbing (off by default for now)
The plumbing for opting individual test jobs into pytest-xdist is in place but every app keeps
pytest_xdist: ""(serial). This unblocks future enablement as a one-line flip per app:pytest-xdist>=3.6.1added to thedevdep group;uv.lockupdated.PYTEST_XDISTMake variable indocker/Makefile.ci-tests. When non-empty, the test recipes append-n <value>. Empty default = serial (no behavior change).pytest_xdistinput on_run-tests.ymlforwarded to the make recipe.tempfile.gettempdir()per xdist worker so the bundle-analysis tests ingraphql_api/tests/test_pull.pyandtest_commit.py(which clean upbundle_analysis_*files in the system tmpdir) won't race once xdist is enabled. Those tests now usetempfile.gettempdir()instead of hard-coded/tmp— small hygiene improvement under serial mode too.Why xdist is staying off in this PR
A run with
pytest_xdist: autoon this branch hit two separate test-isolation regressions:upload/tests/views/test_uploads.py::test_uploads_post_tokenlessandtest_uploads_post_token_required_auth_check(7 parametrized variants) failed under xdist withassert list(upload.flags.all()) == [flag1, flag2]returning only[flag2], even though bothUploadFlagMembershiprows were demonstrably present in the DB for that upload. Looks like Django M2M-related caching or a parametrize-ordering effect — worth fixing in a focused follow-up rather than gating this PR.test_postgres_gw0"already exists" / "is being accessed by other users") inside shared's customdjango_setup_test_dbinlibs/shared/shared/testutils.py. Separate plumbing change.Both are tracked for follow-ups. The wiring in this PR means re-enabling later is a one-line change per workflow.
For the record, the API xdist run (before backing off) measured 3:38 vs the previous ~6:00 serial baseline, so the underlying speedup is real and worth pursuing in a follow-up.
Test plan
Push Self Hosted Image (API/Worker)check name being gone is intentional w.r.t. branch protection.Notes / follow-ups
apps/worker/conftest.pyuses the deprecated xdistslaveinputAPI and mutatesengine.url).django_setup_test_dbworker-safe.upload.flags.all()discrepancy intest_uploads.py.