Skip to content

fix(pegboard): isolate runner config dc lookup failures#4625

Merged
MasterPtato merged 1 commit intomainfrom
04-12-fix_pegboard_isolate_runner_config_dc_lookup_failures
Apr 13, 2026
Merged

fix(pegboard): isolate runner config dc lookup failures#4625
MasterPtato merged 1 commit intomainfrom
04-12-fix_pegboard_isolate_runner_config_dc_lookup_failures

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@claude
Copy link
Copy Markdown

claude bot commented Apr 13, 2026

PR Review: fix(pegboard) - isolate runner config dc lookup failures

Summary: Focused change converting list_runner_config_enabled_dcs_inner from fail-fast (propagating errors) to fault-tolerant behavior (log warning and continue). Appropriate for a distributed system where individual DC lookups may transiently fail.

Positive Observations

  • Fault isolation is correct. A single unreachable DC should not abort the entire lookup; the new behavior degrades gracefully.
  • Structured logging follows project conventions. The warn! log uses structured fields (?err, namespace_id, runner_name, dc_label) and a lowercase message.
  • Clean import cleanup. TryStreamExt is correctly removed since try_filter_map/try_collect are replaced with non-Try equivalents.
  • Stream pipeline is correct. filter_map(std::future::ready) is the right replacement when futures return Option instead of Result<Option>.

Concerns

1. Silent failure semantics: When a DC lookup fails, the DC is excluded from results — indistinguishable from a DC with no runner config. If this function influences where runners are scheduled, transient storage errors could silently prevent runners from being deployed to otherwise-healthy DCs. Consider adding a comment near the Err arm explaining why exclusion-on-failure is the intended behavior, so future readers understand it is deliberate.

2. No test coverage for the error path: A test verifying that one failing DC does not abort the overall result (and that successful DCs are still returned) would harden this against regressions.

Minor

  • buffer_unordered(512) predates this change, but 512 concurrent KV operations is quite high. Not a blocker, but worth a follow-up.

Verdict

The change is correct and an improvement over the fail-fast behavior. Main follow-ups: test coverage for the error path, and a comment explaining the exclusion-on-failure semantics.

@NathanFlurry NathanFlurry force-pushed the 04-12-fix_pegboard_isolate_runner_config_dc_lookup_failures branch from 30994fa to 8490d0e Compare April 13, 2026 02:28
@NathanFlurry NathanFlurry force-pushed the 04-12-docs_pegboard_note_runner_config_upsert_split_write_risk branch from b87b389 to 236286b Compare April 13, 2026 02:28
@NathanFlurry NathanFlurry marked this pull request as ready for review April 13, 2026 05:07
@NathanFlurry NathanFlurry force-pushed the 04-12-docs_pegboard_note_runner_config_upsert_split_write_risk branch from 236286b to de32947 Compare April 13, 2026 05:38
@NathanFlurry NathanFlurry force-pushed the 04-12-fix_pegboard_isolate_runner_config_dc_lookup_failures branch from 8490d0e to 0f0530c Compare April 13, 2026 05:38
@NathanFlurry NathanFlurry force-pushed the 04-12-docs_pegboard_note_runner_config_upsert_split_write_risk branch from de32947 to 66df461 Compare April 13, 2026 05:50
@NathanFlurry NathanFlurry force-pushed the 04-12-fix_pegboard_isolate_runner_config_dc_lookup_failures branch 2 times, most recently from 7702123 to 2759537 Compare April 13, 2026 07:03
@github-actions
Copy link
Copy Markdown
Contributor

Preview packages published to npm

Install with:

npm install rivetkit@pr-4625

All packages published as 0.0.0-pr.4625.34285f4 with tag pr-4625.

Engine binary is shipped via @rivetkit/engine-cli on linux-x64-musl, linux-arm64-musl, darwin-x64, and darwin-arm64. Windows users should use the release installer or set RIVET_ENGINE_BINARY.

Docker images:

docker pull rivetdev/engine:slim-34285f4
docker pull rivetdev/engine:full-34285f4
Individual packages
npm install rivetkit@pr-4625
npm install @rivetkit/react@pr-4625
npm install @rivetkit/rivetkit-native@pr-4625
npm install @rivetkit/sqlite-wasm@pr-4625
npm install @rivetkit/workflow-engine@pr-4625

Copy link
Copy Markdown
Contributor

MasterPtato commented Apr 13, 2026

Merge activity

  • Apr 13, 8:40 PM UTC: A user started a stack merge that includes this pull request via Graphite.
  • Apr 13, 8:58 PM UTC: Graphite rebased this pull request as part of a merge.
  • Apr 13, 8:59 PM UTC: @MasterPtato merged this pull request with Graphite.

@MasterPtato MasterPtato changed the base branch from 04-12-docs_pegboard_note_runner_config_upsert_split_write_risk to graphite-base/4625 April 13, 2026 20:54
@MasterPtato MasterPtato changed the base branch from graphite-base/4625 to main April 13, 2026 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants