Skip to content

Reread completed job before process_results#755

Merged
epompeii merged 1 commit intodevelfrom
u/ep/update-job
Mar 30, 2026
Merged

Reread completed job before process_results#755
epompeii merged 1 commit intodevelfrom
u/ep/update-job

Conversation

@epompeii
Copy link
Copy Markdown
Member

This changeset moves over to rereading a completed job before calling process_results. Previously, the current job was only reread on the unhappy path. However, this lead to a stale job being used when processing the Job's results.

@github-actions
Copy link
Copy Markdown

🤖 Claude Code Review

PR: #755
Base: devel
Head: u/ep/update-job
Commit: f650dd800b6b04872c4450b8dcdb1c5898809a55


Here's my review:


PR Review: "Reread completed job before process_results"

Summary

Fixes a bug where the in-memory QueryJob (loaded at claim time) had stale timestamps — specifically started was unset — causing process_results to compute a job duration of 0. The fix re-reads the job from the database unconditionally after the status update to get fresh timestamps.

Code Changes (plus/api_runners/src/channel.rs)

The fix is correct and well-targeted. Moving the QueryJob::get re-read before the updated == 0 branch means:

  1. The fresh job is used for both the error-path status checks and the happy-path process_results / store_job_output calls.
  2. The &job change on store_job_output is correct — job is now an owned local, so it needs borrowing.

One minor concern — extra read on the error path: Previously the re-read only happened when updated == 0 (the rare/error case). Now it happens unconditionally, including the happy path. This is the right trade-off since correctness requires it, but worth noting it adds one read query per successful completion. Given this is a job-completion path (not a hot loop), this is fine.

Potential race condition consideration: Between the execute_if_either_status write and the QueryJob::get read, another process could theoretically modify the job. In practice this is safe because:

  • The write already transitioned the status, so the re-read will see Completed.
  • The timestamps (started, claimed) are set earlier in the lifecycle and are immutable after that point.

Connection macro usage: The re-read uses auth_conn! (read-only) which is appropriate since it's just a SELECT. Good.

Test (plus/api_runners/tests/jobs/websocket.rs)

Well-structured regression test. It:

  • Uses Clock::Custom with AtomicI64 for deterministic time — compliant with CLAUDE.md's requirement for no wall-clock time in tests.
  • Advances mock clock by 120s between Running and Completed, then asserts the recorded job_duration equals 120.
  • Follows the existing test patterns in the file.

One nit: The test uses let mut conn = server.db_conn(); to directly query job_duration_by_report. This is a pragmatic approach for verifying the regression, though it couples the test to the schema. Acceptable for a regression test.

Verdict

Approve. Clean, minimal fix for a real bug with a solid regression test. No security, performance, or standards concerns.


Model: claude-opus-4-6

@epompeii epompeii merged commit bfe52df into devel Mar 30, 2026
52 of 53 checks passed
@epompeii epompeii deleted the u/ep/update-job branch March 30, 2026 04:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant