ci: upgrade to GPU runner via moveit_pro_ci v0.3.1, enrich test diagnostics by davetcoleman · Pull Request #655 · PickNikRobotics/moveit_pro_example_ws

davetcoleman · 2026-05-21T22:32:15Z

CI infra upgrade and test-diagnostics improvements for objectives_integration_test. Three commits, kept separate intentionally (please do not squash).

Commit 1 — `enable_gpu: true` on a `picknik-16-amd64-gpu` runner

Bumps the reusable workflow ref to PickNikRobotics/moveit_pro_ci@v0.3.1, sets enable_gpu: true, and switches the runner label from picknik-16-amd64 to picknik-16-amd64-gpu. v0.3.1 appends the CUDA suffix to the image when enable_gpu is true (moveit_pro_ci#26) — without that, v0.3.0 set the runner label but kept the non-CUDA image, so MuJoCo's EGL rendering still went through llvmpipe on CPU.

image_tag is pinned to 9.3.0-rc9 until the main-*-cuda12.6-cudnn9 images are being published.

Test-diagnostics addition: `src/lab_sim/test/conftest.py`

Two pytest hooks (pytest_runtest_logstart, pytest_runtest_logreport) write directly to fd 2, bypassing pytest's --capture=fd. Without this, a CTest timeout kills pytest before any per-test output is flushed, leaving the CI log silent past "collected N items". Now each test prints START <nodeid> on entry and PASSED|FAILED|SKIPPED <nodeid> (<elapsed>s) on completion, so CI logs always show which objective was running and how long each one took — critical for triaging flakes and timeouts.

Commit 2 — `.github/scripts/render_report.py` (HTML report)

Self-contained Python script that turns pytest's objectives_integration_test.xunit.xml artifact into a single-file HTML report:

Groups tests by parent XML directory; row shows only the filename, group header shows the full path and is collapsible.
Resolves the human-readable objective name (e.g. move_flasks_to_burners.xml → Move Flasks to Burners) by reading each XML's main_tree_to_execute attribute against the local moveit_pro / moveit_pro_example_ws checkouts.
Filter buttons (All / Failed / Passed / Skipped), live filename search, click-to-expand failure messages.
No external JS/CSS dependencies — everything inlined so the report opens straight from a CI artifact download.

Usage: python3 .github/scripts/render_report.py <xunit.xml> <out.html>.

Commit 3 — redirect ROS node logs into the test_results artifact

ament_add_pytest_test does not set ROS_LOG_DIR, so launched nodes write to ~/.ros/log/<ts>/ inside the doomed CI container — those logs never get uploaded, making post-mortem of objective failures impossible. Points ROS_LOG_DIR at build/lab_sim/test_results/lab_sim/ros_logs/, which is already inside the existing test-results artifact glob, so launch.log + per-node *.log come back with each CI run.

reset_simulation_before_test relaunches the stack per-test, so each test gets its own timestamped <ts>/ subdirectory under ros_logs/.

github-actions · 2026-05-21T22:32:25Z

⚠️ This PR modifies 1 file(s) that also exist in PickNikRobotics/moveit_pro_empty_ws.

Consider whether the change should land upstream in moveit_pro_empty_ws first so downstream forks pick it up on the next sync.

Overlapping files

.github/workflows/ci.yaml

davetcoleman · 2026-05-22T01:00:10Z

failing because of https://github.com/PickNikRobotics/moveit_pro/issues/19269

github-actions · 2026-05-22T01:44:55Z

📊 Integration test report

Per-distro HTML reports (status table + per-test ROS log slices) are attached to this run's artifacts:

integration-test-report-humble
integration-test-report-jazzy

Download the zip, extract, and open report.html in a browser.

davetcoleman · 2026-05-22T15:50:35Z

@JWhitleyWork @shaur-k — two design questions on #655 before I keep going.

State today: .github/scripts/render_report.py renders per-distro xunit XML + ROS logs into a styled HTML report; the workflow uploads it as an artifact and posts a sticky PR comment linking to the artifact zips (download, extract, open).

Publish reports to GitHub Pages on this repo? Setup: enable Pages → gh-pages source, then peaceiris/actions-gh-pages pushes each failed run's report to pr-<num>/<run-id>/<distro>/. Comment then links to https://picknikrobotics.github.io/moveit_pro_example_ws/pr-655/<run-id>/humble/report.html — viewable in-browser, no zip step. Reports would be public (same visibility as the repo). Alternative: keep the zip-download flow, or route through a third-party CDN like jsdelivr.
Lives here or in moveit_pro_ci? Renderer + workflow jobs are currently in moveit_pro_example_ws. They're generic — xunit + ROS logs in, HTML out, nothing example_ws-specific. If moveit_pro itself runs similar integration tests, the natural home is moveit_pro_ci's reusable workflow. I left it here because example_ws is the only confirmed consumer today.

Looking for yes / no / yes-but on either.

… comment Three CI-infra changes folded together: 1. Bump the reusable workflow ref. v0.3.1 (d490a1d) had the GPU + CUDA-suffix fix. v0.3.2 (90b506e, currently pre-tag) adds the test-results artifact-name suffix `-${{ matrix.ros_distro }}`, so the humble and jazzy jobs no longer both upload to the same artifact name (moveit_pro_ci#27). The pin is by SHA, not tag, so this works against the merged branch before the tag is formally cut. 2. Add render-report job (matrix on humble/jazzy) that downloads each distro's test-results artifact and runs .github/scripts/render_report.py against it, uploading report.html as integration-test-report-${{ matrix.ros_distro }}. Runs whether the integration test passed, failed, or timed out -- the report is most useful for failure post-mortem. 3. Add post-report-comment job that posts (or updates in place via a sticky marker) a single PR comment linking to the rendered reports for that run. Also retained from before: - enable_gpu: true on a picknik-16-amd64-gpu runner so MuJoCo EGL rendering uses the GPU instead of llvmpipe. - src/lab_sim/test/conftest.py pytest hooks (logstart, logreport) writing to fd 2 directly so per-test progress survives a CTest timeout.

Reads pytest's JUnit xunit XML (the test artifact already published by moveit_pro_ci's reusable workflow) and produces a self-contained, single-file HTML report. Groups tests by their parent XML directory, shows the human-readable objective name extracted from each objective XML's main_tree_to_execute attribute, and surfaces filter/search/collapse UI without any external JS dependencies.

ament_add_pytest_test does not set ROS_LOG_DIR, so launched nodes write to the default ~/.ros/log/<ts>/ inside the doomed CI container -- never uploaded. Point ROS_LOG_DIR at build/lab_sim/test_results/lab_sim/ros_logs/ instead, which is already inside the existing 'test-results' artifact glob, so launch.log + per-node *.log come back with each CI run.

JWhitleyWork · 2026-05-22T16:36:46Z

@JWhitleyWork @shaur-k — two design questions on #655 before I keep going.

State today: .github/scripts/render_report.py renders per-distro xunit XML + ROS logs into a styled HTML report; the workflow uploads it as an artifact and posts a sticky PR comment linking to the artifact zips (download, extract, open).

Publish reports to GitHub Pages on this repo? Setup: enable Pages → gh-pages source, then peaceiris/actions-gh-pages pushes each failed run's report to pr-<num>/<run-id>/<distro>/. Comment then links to https://picknikrobotics.github.io/moveit_pro_example_ws/pr-655/<run-id>/humble/report.html — viewable in-browser, no zip step. Reports would be public (same visibility as the repo). Alternative: keep the zip-download flow, or route through a third-party CDN like jsdelivr.

Lives here or in moveit_pro_ci? Renderer + workflow jobs are currently in moveit_pro_example_ws. They're generic — xunit + ROS logs in, HTML out, nothing example_ws-specific. If moveit_pro itself runs similar integration tests, the natural home is moveit_pro_ci's reusable workflow. I left it here because example_ws is the only confirmed consumer today.

Looking for yes / no / yes-but on either.

I would say it probably isn't necessary to publish each of these if they're attached to the runs. Usually we don't need to look at them unless we're investigating something. Up to you, though. I don't care that much.
This somewhat depends on whether you want to provide this to MIP users or not. moveit_pro_ci is a public repo where the reusable workflows/actions live that both we internally but also MIP users externally use for CI jobs.

davetcoleman · 2026-05-22T17:07:23Z

I would say it probably isn't necessary to publish each of these if they're attached to the runs. Usually we don't need to look at them unless we're investigating something. Up to you, though. I don't care that much.

The latest version only publishes the comment IF the integration test fails, so i think it should only publish the html file also if it fails. How about that?

This somewhat depends on whether you want to provide this to MIP users or not. moveit_pro_ci is a public repo where the reusable workflows/actions live that both we internally but also MIP users externally use for CI jobs.

I dont mind if this is public, but my goal is for this to be used in moveit_pro... @shaur-k 's new updates runs the example_ws integration tests for every moveit_pro PR, right?

JWhitleyWork · 2026-05-22T17:18:31Z

I would say it probably isn't necessary to publish each of these if they're attached to the runs. Usually we don't need to look at them unless we're investigating something. Up to you, though. I don't care that much.

The latest version only publishes the comment IF the integration test fails, so i think it should only publish the html file also if it fails. How about that?

This is fine with me.

This somewhat depends on whether you want to provide this to MIP users or not. moveit_pro_ci is a public repo where the reusable workflows/actions live that both we internally but also MIP users externally use for CI jobs.

I dont mind if this is public, but my goal is for this to be used in moveit_pro... @shaur-k 's new updates runs the example_ws integration tests for every moveit_pro PR, right?

I don't think he has done this yet but I think it is planned.

This was referenced May 21, 2026

Standardize MuJoCo timestep and increase camera resolution across robot configs #648

Draft

CI bisect: does the camera resolution change alone break the gripper? #657

Draft

davetcoleman changed the title ~~CI bisect: does 9.3.0-rc9 + GPU runner pass with no scene changes?~~ ci: upgrade to GPU runner via moveit_pro_ci v0.3.1, enrich test diagnostics May 21, 2026

davetcoleman force-pushed the ci-bisect-9.3.0-rc9-only branch from 5efd377 to 8640403 Compare May 22, 2026 00:43

davetcoleman force-pushed the ci-bisect-9.3.0-rc9-only branch from 8640403 to 608fabe Compare May 22, 2026 01:22

davetcoleman added 3 commits May 22, 2026 09:53

davetcoleman force-pushed the ci-bisect-9.3.0-rc9-only branch from 608fabe to 62e8e3a Compare May 22, 2026 15:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: upgrade to GPU runner via moveit_pro_ci v0.3.1, enrich test diagnostics#655

ci: upgrade to GPU runner via moveit_pro_ci v0.3.1, enrich test diagnostics#655
davetcoleman wants to merge 3 commits into
mainfrom
ci-bisect-9.3.0-rc9-only

davetcoleman commented May 21, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 21, 2026 •

edited

Loading

Uh oh!

davetcoleman commented May 22, 2026

Uh oh!

github-actions Bot commented May 22, 2026 •

edited

Loading

Uh oh!

davetcoleman commented May 22, 2026

Uh oh!

JWhitleyWork commented May 22, 2026

Uh oh!

davetcoleman commented May 22, 2026

Uh oh!

JWhitleyWork commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

davetcoleman commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Commit 1 — enable_gpu: true on a picknik-16-amd64-gpu runner

Test-diagnostics addition: src/lab_sim/test/conftest.py

Commit 2 — .github/scripts/render_report.py (HTML report)

Commit 3 — redirect ROS node logs into the test_results artifact

Uh oh!

github-actions Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davetcoleman commented May 22, 2026

Uh oh!

github-actions Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📊 Integration test report

Uh oh!

davetcoleman commented May 22, 2026

Uh oh!

JWhitleyWork commented May 22, 2026

Uh oh!

davetcoleman commented May 22, 2026

Uh oh!

JWhitleyWork commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

davetcoleman commented May 21, 2026 •

edited

Loading

Commit 1 — `enable_gpu: true` on a `picknik-16-amd64-gpu` runner

Test-diagnostics addition: `src/lab_sim/test/conftest.py`

Commit 2 — `.github/scripts/render_report.py` (HTML report)

github-actions Bot commented May 21, 2026 •

edited

Loading

github-actions Bot commented May 22, 2026 •

edited

Loading