Rerun terminally-failed scheduled jobs in place, with failure backoff#48
Merged
Merged
Conversation
A non-concurrent row gates its next dispatch on QueuedJobsTable::isQueued(). Once cakephp-queue records terminal "aborted" state, a dead job stops counting as queued, so the schedule resumes — but it would then queue a brand-new job every tick, leaving one failed row behind per interval (e.g. ~1440/day for an every-minute task). #1 — Rerun in place: when the row's previous dispatch terminally failed (queue status = aborted), run() resets that same job (clears attempts and the stale aborted status) instead of creating a new one. Only one recycled row per logical scheduled job, no pile-up. Checked before the in-flight hold-back so it works whether or not the installed queue release already excludes aborted jobs from isQueued() — an aborted job is terminal and never blocks. Still-retrying jobs are untouched: while attempts remain the job is genuinely in flight and the next tick is held back, so there is no early re-dispatch. #2 — Backoff: a new `consecutive_failures` counter tracks aborts without an intervening success. After `QueueScheduler.maxConsecutiveFailures` (default 0 = unlimited) the row is disabled and a `QueueScheduler.Row.disabled` event is fired so the host app can alert. A successful/fresh dispatch resets the counter to 0. The disable write commits while run() still reports "not dispatched": the transactional() callback returns true (commit) for both a dispatch and a backoff-disable, and only false (rollback) for hold-back / lost compare-and-swap. The actual dispatched/not-dispatched result is tracked in an outer flag so the disable is not rolled back. Adds the `consecutive_failures` column (migration + fixture + entity), three SchedulerRowsTable tests, and documents the config key in docs/README.md and config/app.example.php. Pairs with dereuromark/cakephp-queue#504 (which persists the terminal aborted state); dormant and behaviour-preserving on older queue releases.
JSON_THROW_ON_ERROR was passed as the 3rd argument ($depth) instead of the 4th ($flags), so the throw-on-error behaviour never actually engaged and a malformed param would have returned null rather than throwing. Pass an explicit depth (512, the PHP default) and the flag in its proper slot. Newer phpstan flags this as argument.invalidConstant; the bug predates this branch but only surfaced now via a phpstan release.
- maxConsecutiveFailures now counts reruns granted rather than aborts: the cap is checked before incrementing, so re-enabling a disabled row recovers through the in-place rerun path on every queue version. Previously a cap of 1 could never retry, and re-enabling immediately re-disabled the row. - Reset the consecutive_failures streak in beforeSave() when a row is re-enabled, granting it a fresh round of reruns. - Dispatch the QueueScheduler.Row.disabled event and log the warning after the transaction commits, so a listener never runs mid-transaction. - Tidy a detached test docblock left over from the earlier insertion.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pairs with dereuromark/cakephp-queue#504 (which persists terminal
abortedstate). Dormant and behaviour-preserving on older queue releases.Problem
A non-concurrent row gates its next dispatch on
QueuedJobsTable::isQueued(). Once the queue records terminalabortedstate, a dead job stops counting as queued and the schedule resumes — but it would then queue a brand-new job every tick, leaving one failed row behind per interval (≈1440/day for an every-minute task). That's a different failure mode from the wedge, and it's noisy.1 — Rerun in place
When the row's previous dispatch terminally failed (queue status =
aborted),run()resets that same job (clears attempts + the stale aborted status) instead of creating a new one. One recycled row per logical scheduled job, no pile-up.isQueued()(#504) — an aborted job is terminal and never blocks.2 — Backoff + disable + alert
A new
consecutive_failurescounter tracks aborts without an intervening success. AfterQueueScheduler.maxConsecutiveFailures(default0= unlimited) the row is disabled and aQueueScheduler.Row.disabledevent is dispatched so the host app can alert. A successful / fresh dispatch resets the counter.Transaction note
The disable write must commit while
run()still reports "not dispatched". Thetransactional()callback returnstrue(commit) for both a dispatch and a backoff-disable, and onlyfalse(rollback) for hold-back / lost compare-and-swap. The actual dispatched result is tracked in an outer flag — returningfalsefrom the callback would roll the disable back (that was a real bug caught in testing).Changes
consecutive_failurescolumn: migration + fixture + entity property.run()rerun/backoff logic +abortedJobId()helper.testRunRerunsAbortedJobInsteadOfCreatingNew,testRunDisablesRowAfterMaxConsecutiveFailures,testRunResetsFailureCounterOnFreshDispatch).docs/README.md+config/app.example.php.SchedulerRowsTableTestgreen (28); phpcs clean; phpstan clean on the changed table file. (A pre-existingjson_decodeflags/depth phpstan note inSchedulerRow::_getJobData()is unrelated and only surfaces under newer phpstan than the pinned CI version.)