Skip to content

Optimize CI for wolfProvider#400

Open
aidangarske wants to merge 9 commits into
wolfSSL:masterfrom
aidangarske:ci-draft-pause
Open

Optimize CI for wolfProvider#400
aidangarske wants to merge 9 commits into
wolfSSL:masterfrom
aidangarske:ci-draft-pause

Conversation

@aidangarske
Copy link
Copy Markdown
Member

@aidangarske aidangarske commented May 23, 2026

Description

  • trigger OSP projects to run nightly and send slack message if fail
  • dynamically get latest wolfssl version and openssl version
  • (All OSP where getting tested by 3.0.20 from debian:bookworm not 3.5.4)
  • add ubsan and asan for WP specifically
  • Add smoke tests for draft
  • Only test on status "open" only smoke on draft
  • no apt-get use ghcr container
  • backward comapt for 5.8.4

related PR's need to go in first in this order then this one

  1. wolfProvider: 5.9.1 FIPS patches (krb5, hostap, stunnel, libssh2, curl) osp#340
  2. https://github.com/wolfSSL/testing/pull/962
  3. https://github.com/wolfSSL/testing/pull/958

Copilot AI review requested due to automatic review settings May 23, 2026 06:27
@aidangarske aidangarske marked this pull request as draft May 23, 2026 06:30
@aidangarske aidangarske changed the title ci: pause non-smoke workflows on draft PRs, add smoke preflight Optimize CI for wolfProvider May 23, 2026
@aidangarske aidangarske reopened this May 23, 2026
@aidangarske aidangarske self-assigned this May 23, 2026
@aidangarske aidangarske requested review from Copilot and dgarske and removed request for Copilot May 23, 2026 06:43
@aidangarske aidangarske marked this pull request as ready for review May 25, 2026 19:25

This comment was marked as resolved.

aidangarske added a commit to aidangarske/wolfProvider that referenced this pull request May 25, 2026
…ew fix)

Was: every workflow pulled ghcr.io/wolfssl/wolfprovider-test-deps:bookworm,
which doesn't exist until upstream master runs the publish workflow.
Bootstrap chicken-and-egg.

Now: publish-test-deps-image.yml fires on any branch push (and PRs)
and pushes to ghcr.io/<repo-owner>/wolfprovider-test-deps:bookworm.
Consumer workflows read from the PR head's owner when on a PR, else
the running repo's owner. Result: a fork PR publishes to the fork's
ghcr namespace and pulls from it; master pushes publish to the org's
ghcr namespace and pulls from it.

Also fixes copilot review feedback from
wolfSSL#400 (review)

- Phase B log filename renames broke check-workflow-result.sh's
  hardcoded log paths (curl-test.log, openvpn-test.log, sssd-test.log,
  net-snmp-test.log, nginx-test.log, openssh-test.log, tcpdump-test.log,
  liboauth2-test.log, stunnel-test.log) plus in-step greps in cjose,
  libcryptsetup, libfido2, libhashkit2, libtss2, opensc, python3-ntp,
  qt5network5, tnftp, tpm2-tools. Reverted log names back to
  <app>-test.log; second mode overwrites first.
- libtss2.yml: fix `if $(grep -q ...)` (invalid shell -- command
  substitution of grep used as the if condition expanded to an empty
  command). Use `if grep -q ...; then`.
- opensc.yml: fix `TEST_RESULT=$(((grep ...) && echo 0 || echo 1))`
  (arithmetic expansion `(( ))` can't contain shell commands). Hoist
  to a check_opensc_log() function called from both modes.
- stunnel.yml: `grep -c "failed: 0"` returns 1 on success, but
  check-workflow-result.sh expects TEST_RESULT==0 for pass.
  Use `if grep -q ...; then TEST_RESULT=0; else TEST_RESULT=1; fi`.
  Also mirror tests/logs/results.log to stunnel-test.log so the
  force-fail check finds the expected file.
- hostap.yml: drop continue-on-error from the normal-mode test step.
  Without it the step's exit code was swallowed and normal-mode test
  failures didn't fail the job.

One-time setup: after this lands, the owner of each fork that opens a
PR has to make their ghcr.io/<owner>/wolfprovider-test-deps package
public (GitHub UI: Packages -> Package settings -> Change visibility).
GitHub's Actions runners can only pull public packages from another
namespace.
aidangarske added a commit to aidangarske/wolfProvider that referenced this pull request May 25, 2026
…vate)

Earlier commits tried to make fork CI work by:
  - having publish-test-deps-image.yml push to a per-owner ghcr namespace
    (ghcr.io/<owner>/wolfprovider-test-deps)
  - having consumer workflows pull from the PR head's owner
  - auto-PATCHing the test-deps package to visibility=public
  - dropping the `github.repository == 'wolfSSL/wolfProvider'` guard on
    the wolfprov-debs ORAS pull in build-wolfprovider.yml

That path only works if the packages can be public, which they can't
(some of the .debs contain commercially-licensed bits). Revert to the
canonical-only behavior:

publish-test-deps-image.yml
  - fires only on push to master/main (was '**')
  - guards the publish on github.repository == 'wolfSSL/wolfProvider'
  - drops the per-owner namespace; always pushes to
    ghcr.io/wolfssl/wolfprovider-test-deps
  - removes the Mark-package-public step

build-wolfprovider.yml
  - restores the github.repository == 'wolfSSL/wolfProvider' guard on
    the Login, Download .debs, and Download WIC steps

39 consumer workflows
  - container.image reverted from the per-owner expression back to the
    literal ghcr.io/wolfssl/wolfprovider-test-deps:bookworm

Practical effect: PR CI and nightly only run on the canonical repo
(or once PR wolfSSL#400 merges, on wolfSSL/wolfProvider's runners). Fork
pushes will skip the wolfprov-deb pull and any container-using job
will fail loud at the image pull -- which is the right signal: those
runs need to happen on the canonical repo.
aidangarske added a commit to aidangarske/wolfProvider that referenced this pull request May 25, 2026
…idation)

Add pull_request trigger to nightly-osp.yml so PR wolfSSL#400's reviewers
can see the dispatcher actually fan all 41 reusable workflows out
and the notify job hit Slack.

Marked temporary in the file header -- revert this trigger before
merging if you don't want the full nightly job set firing on every
PR. (For everyday CI, scheduled + workflow_dispatch is the intended
shape.)

Note: PR runs from forks will still hit the private-package issue
for the wolfprov-debs pull (the wolfSSL/wolfProvider repo guard
short-circuits the ORAS step on non-canonical repos). The plumbing
itself -- dispatch, discover-versions, notify, Slack -- runs
regardless and is what this PR-trigger lets you verify end-to-end.
aidangarske added a commit to aidangarske/wolfProvider that referenced this pull request May 25, 2026
Adds aidangarske/wolfProvider to the publish workflow's repository
allowlist so PR wolfSSL#400's working branch can bootstrap a test-deps
image on the fork's ghcr namespace. Pushed image lands at
ghcr.io/aidangarske/wolfprovider-test-deps:bookworm.

Also adds 'ci-draft-pause' to the branches list (alongside master/
main) so a push to that branch triggers the workflow without needing
a separate workflow_dispatch.

Consumer workflows continue to pull from ghcr.io/wolfssl/... so this
fork-side push is purely for the fork owner to verify the
build/push pipeline works end to end before PR merges. After merge,
the canonical wolfSSL/wolfProvider master push will publish the
authoritative image and consumers will find it.

Note: the 'ci-draft-pause' branch entry is TEMPORARY for PR wolfSSL#400.
Drop it (and remove aidangarske from the allowlist if desired)
once the PR merges.
dgarske pushed a commit that referenced this pull request May 26, 2026
)

Bootstrap PR: introduces the test-deps container image that PR #400's
nightly OSP workflows consume. This is a minimal subset of PR #400
intended to merge first, so the publish workflow fires once on master
and the test-deps image lands at ghcr.io/wolfssl/wolfprovider-test-deps
:bookworm before the rest of PR #400 merges. Without this, PR #400's
OSP container jobs all fail with "manifest unknown" because the image
they pull doesn't exist anywhere yet.

Two files only:
  docker/wolfprovider-test-deps/Dockerfile
    Single Debian-bookworm image with every apt dep that the OSP
    integration tests used to install at job time. One apt-get update
    at build time, zero at job time -- eliminates Debian mirror flake.

  .github/workflows/publish-test-deps-image.yml
    Builds the Dockerfile and pushes to
    ghcr.io/wolfssl/wolfprovider-test-deps:bookworm on push to
    master/main (path-filtered to docker/wolfprovider-test-deps/**)
    or workflow_dispatch. Guarded with
    github.repository == 'wolfSSL/wolfProvider' so forks don't try
    to push to wolfSSL's namespace.

The OSP workflows themselves, the discover-versions resolver, the
ASan/UBSan workflow, and all the matrix/force-fail consolidation
land via PR #400 once this is in place.
dgarske added a commit that referenced this pull request May 26, 2026
ci: bootstrap test-deps Docker image (prep for PR #400)
aidangarske added a commit to aidangarske/wolfProvider that referenced this pull request May 26, 2026
PR wolfSSL#402 published ghcr.io/wolfssl/wolfprovider-test-deps:bookworm.
This empty commit bumps the head SHA so PR wolfSSL#400's checks rerun
against the now-existing image.
@aidangarske aidangarske force-pushed the ci-draft-pause branch 3 times, most recently from 5ce6df6 to 91f2549 Compare May 27, 2026 04:50
@aidangarske aidangarske requested review from ColtonWilley and padelsbach and removed request for dgarske May 27, 2026 04:54
@aidangarske aidangarske force-pushed the ci-draft-pause branch 2 times, most recently from 82d537b to e5226fb Compare May 27, 2026 05:21
Copy link
Copy Markdown

@wolfSSL-Fenrir-bot wolfSSL-Fenrir-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fenrir Automated Review — PR #400

Scan targets checked: wolfprovider-bugs, wolfprovider-src

No new issues found in the changed files. ✅

@aidangarske
Copy link
Copy Markdown
Member Author

Jenkins retest this please

aidangarske added a commit to aidangarske/wolfProvider that referenced this pull request May 27, 2026
…n run

The Smoke Test workflow ran on PR wolfSSL#400 head commit and concluded as
startup_failure with 0 jobs. That's GH Actions failing to validate the
workflow before any container spawns. Compared against every other
workflow that calls _discover-versions.yml (simple, cmdline,
multi-compiler, fips-ready, sanitizers, seed-src), smoke-test.yml is
the only one with a workflow-level 'permissions: contents: read' block.

The reusable _discover-versions.yml job declares
'permissions: { contents: read, packages: read }' for its oras login
ghcr.io step. Workflow-level permissions clamp every job including
reusable workflows, so the discover_versions job ended up with strictly
fewer permissions than it declared, which trips startup validation.

Grant packages:read at the workflow level so the reusable workflow's
declared permissions can be satisfied. Keep the explicit block instead
of removing it - the other working workflows just rely on the repo
default token, but smoke-test.yml should stay explicit since it's the
gate everything else waits on.
aidangarske and others added 6 commits May 27, 2026 13:38
* Orchestrate the OSP suite via a single Nightly OSP workflow
  (.github/workflows/nightly-osp.yml) that fans out every per-app
  workflow (bind9, cjose, curl, debian-package, git-ssh-dr, grpc,
  hostap, iperf, krb5, libcryptsetup, libeac3, libfido2, libhashkit2,
  libnice, liboauth2, librelp, libssh2, libtss2, libwebsockets,
  net-snmp, nginx, openldap, opensc, openssh, openvpn, pam-pkcs11,
  ppp, python3-ntp, qt5network5, rsync, socat, sscep, sssd, stunnel,
  systemd, tcpdump, tnftp, tpm2-tools, x11vnc, xmlsec) plus the
  openssl-version sweep and the static-analysis suite, then aggregates
  results to Slack.
* Resolve wolfSSL and OpenSSL versions dynamically per nightly run
  via .github/workflows/_discover-versions.yml so the matrix reflects
  what actually ships on ghcr.io and what's latest upstream rather
  than what was hand-bumped here.
* Switch OSP test jobs to the test-deps image
  ghcr.io/wolfssl/wolfprovider-test-deps:bookworm with all deps
  pre-installed (built by .github/workflows/publish-test-deps-image.yml
  from docker/wolfprovider-test-deps/Dockerfile).
* Drop the openssl-3.0.20 -> 3.5.4 source build from the OSP path;
  OSP suites now use the bookworm system OpenSSL (which is the
  wolfprov-replace-default .deb on ghcr).
* Add a dedicated Sanitizers workflow that builds wolfssl + wolfprov
  with -fsanitize=address,undefined (one job) and -fsanitize=thread
  (separate job -- ASan and TSan can't coexist in one binary), then
  runs the cmd-tests + wolfprov unit tests under each. Cache
  openssl-source/install across runs so source-build skips when refs
  match. WOLFPROV_SKIP_TEST=1 lets the build step skip the internal
  make test (which needed LD_PRELOAD=libasan and segfaulted dpkg/grep
  in the build path) and run unit tests as a separate step instead.
  ASAN_OPTIONS=detect_odr_violation=0 suppresses a known false
  positive from the provider's static ASN.1 table being linked into
  both libwolfprov.so and the test binary. For TSan, the unit-test
  step skips LD_PRELOAD entirely -- libtsan is wired in via DT_NEEDED
  on the TSan-built test binary, and preloading it into make crashes
  the non-TSan host process.
* Convert .github/workflows/static-analysis.yml (cppcheck, clang
  scan-build, Facebook Infer) from a standalone 2 AM cron to
  workflow_call so it runs in the nightly-osp fan-out alongside the
  OSP integrations. Single nightly cadence, single Slack summary.
* Smoke test gate (.github/workflows/smoke-test.yml) runs on every
  push/PR including drafts; other PR-time workflows wait for it via
  .github/actions/wait-for-smoke.
* PR mode runs smoke + simple + cmd-tests + multi-compiler + fips-ready
  + codespell + sanitizers. The full OSP matrix and the heavy static
  analyzers only run nightly / on workflow_dispatch.
* Bump every per-app OSP workflow timeout-minutes to >= 60 so flaky
  long-tail tests don't trip the previous 15/20/30 minute caps.
* Document the full CI structure in .github/README.md -- three tiers
  (PR/push, nightly, reusable), per-OSP inventory with the wolfprov
  surface each one exercises, the WOLFPROV_FORCE_FAIL XOR sanity
  check, the OSP workflow template, and a failure -> log-section
  cheat sheet.
* Fix a real ASan global-buffer-overflow caught by the new sanitizer
  job: src/wp_aes_aead.c was using XMEMCMP(params->key, X, sizeof(X))
  to compare a NUL-terminated provider parameter name against a
  string literal, which overreads the caller's buffer when their key
  is shorter than the constant (e.g. "tlsivinv" vs "tlsivfixed").
  Switch to XSTRCMP for the five AEAD parameter key checks.

This pairs with wolfssl/osp PR wolfSSL#340 which provides the 5.9.1 FIPS
patches the per-app workflows reference. Once that merges these
workflows will be green end-to-end.
…n run

The Smoke Test workflow ran on PR wolfSSL#400 head commit and concluded as
startup_failure with 0 jobs. That's GH Actions failing to validate the
workflow before any container spawns. Compared against every other
workflow that calls _discover-versions.yml (simple, cmdline,
multi-compiler, fips-ready, sanitizers, seed-src), smoke-test.yml is
the only one with a workflow-level 'permissions: contents: read' block.

The reusable _discover-versions.yml job declares
'permissions: { contents: read, packages: read }' for its oras login
ghcr.io step. Workflow-level permissions clamp every job including
reusable workflows, so the discover_versions job ended up with strictly
fewer permissions than it declared, which trips startup validation.

Grant packages:read at the workflow level so the reusable workflow's
declared permissions can be satisfied. Keep the explicit block instead
of removing it - the other working workflows just rely on the repo
default token, but smoke-test.yml should stay explicit since it's the
gate everything else waits on.
These workflows were apt-get update'ing the host runner on every job,
which is slow and intermittently hangs (e.g. the clang-14 build in run
26527356013 timed out after 20m on apt-get). All the packages they
install are already in the test-deps container.

Add 'container: ghcr.io/wolfssl/wolfprovider-test-deps:bookworm' to
each job and drop the apt-get step:
- sanitizers.yml: both ASan+UBSan and TSan jobs - the install set
  (build-essential autoconf automake libtool pkg-config git curl wget
  patch m4 gettext) is already baked in.
- static-analysis.yml: cppcheck, scan-build, and infer jobs. cppcheck,
  clang, clang-tools, and build deps are already baked. Add opam to
  the image so the infer job can drop its apt step too. Infer itself
  (~100MB tarball) is still downloaded at job time to keep the image
  small.
- libnice.yml: drop the redundant apt step entirely - the workflow
  was already running in the container; build-essential, pkg-config,
  meson, ninja-build, libglib2.0-dev, libgstreamer1.0-dev, and
  gstreamer1.0-plugins-base-apps are all in the image. Add the one
  missing piece (libunwind-dev) to the Dockerfile.

Dockerfile delta: add opam (infer dep) and libunwind-dev (libnice
dep). Image rebuilds on push via publish-test-deps-image.yml.

multi-compiler.yml is not converted in this commit. Its matrix needs
gcc-9, gcc-10, gcc-13, gcc-14, and clang-12 which are not available
in Debian Bookworm; that workflow needs either a separate ubuntu-base
container or a matrix reduction.
PR-time multi-compiler was apt-get update'ing the runner before
installing gcc-X/clang-X. When the runner's apt cache was stale this
hung past the 20m job timeout (e.g. clang-14 in run 26527356013) and
cancelled the whole compiler-matrix run.

Restrict the PR-time matrix to compilers that ship in the test-deps
container (Debian Bookworm: gcc-11, gcc-12, clang-13, clang-14,
clang-15) and run inside the container, so the apt-get step goes
away entirely. Six matrix entries cover the common compiler bases
plus one entry pinned to wolfssl v5.8.0-stable for back-compat.

Dropped from PR-time vs prior matrix: gcc-9, gcc-10, gcc-13, gcc-14,
clang-12 (not in Bookworm or its backports).

To restore that coverage at nightly cadence, add
nightly-multi-compiler.yml which runs the FULL original matrix on
GitHub-hosted Ubuntu runners (22.04 + latest). The verify-or-install
step skips apt-get when the compiler is already on PATH, so most
matrix entries don't apt-get at all and the slow path only fires
when a runner image change drops a pre-installed compiler. Wired
into nightly-osp.yml as 'multi-compiler:' with the matching needs:
entry on the Slack notify job.
resolve-ref.sh pipes curl into jq; image was missing it, multi-compiler
hit 'jq: command not found' in run 26529715823.
The test-deps container's default shell is dash (Debian's /bin/sh),
not bash. 'source' is a bash builtin - dash has no such command, so
the sanitizers step exits 127 with 'source: not found' before the
actual test runs (job 78146645114).

Add 'shell: bash' to the three steps in sanitizers.yml that source
scripts/env-setup (ASan job test + cmd-tests, TSan job test). Other
PR-time workflows that use 'source scripts/env-setup' (cmdline,
fips-ready) run on the host ubuntu-22.04 runner where bash is the
default, so they don't need this fix.
…d-14-dev

publish-test-deps-image run 26537414820 failed:
  libunwind-14-dev : Conflicts: libunwind-dev

clang pulls libunwind-14-dev in transitively. If libnice actually needs
libunwind it can use libunwind-14-dev, not the unversioned package.
OpenSSL/wolfSSL build errors get logged to scripts/build-release.log
by build-wolfprovider.sh, not test-suite.log. Without dumping the
build log the workflow just shows 'Build OpenSSL master ... ERROR.'
with no detail. Match the sanitizers.yml log-dump pattern.
multi-compiler matrix asks for gcc-11, gcc-12, clang-13, clang-14,
clang-15 as explicit binary names. The image only had unversioned
'gcc' (= gcc-12) and 'clang' (= clang-14), so make hit 'gcc-11: not
found' for any other matrix entry (run 26538567521 job 78173898262).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants