[scheduler/cuebot] Replace Redis-backed accounting with in-memory store + PG LISTEN/NOTIFY#2472
Draft
DiegoTavares wants to merge 3 commits into
Draft
[scheduler/cuebot] Replace Redis-backed accounting with in-memory store + PG LISTEN/NOTIFY#2472DiegoTavares wants to merge 3 commits into
DiegoTavares wants to merge 3 commits into
Conversation
…re + PG LISTEN/NOTIFY The accounting subsystem coordinated Cuebot and the Rust scheduler through Redis to enable horizontal scaling across N scheduler instances. The scheduler is and will remain single-instance (N=1), so Redis's only unique benefit is unreachable while it manufactured an entire class of accounting-drift bugs (limit-seeding fail-closed, mass dispatch rejection, double-booking, CAS starvation). A single in-process counter is the source of truth that makes that bug class structurally impossible. All shows were already drained to Cuebot-managed, so the cutover is clean (deploy into an idle role, then flip shows back one at a time via `b_scheduler_managed`). **Scheduler** — Redis → in-memory `Store`: - `accounting/store.rs`: one `Mutex`, atomic check-and-increment across the three enforced vertices (subscription burst, folder/job max cores+gpus). Layer/point were incremented but never read, so they're dropped. - Live updates via PG `LISTEN/NOTIFY` (`accounting/listener.rs`): `acct_release` and `acct_limit_change`. - Recompute from `SUM(proc)` is the backstop (absolute overwrite, no CAS), carrying in-flight bookings forward via an epoch double-buffer so it can never erase a not-yet-snapshot-visible booking → never over-books a hard cap. - Blocking seeds gate dispatch: bootstrap and managed-flip both seed caps **and** booked counters before enforcing. - Deleted `redis_client.rs`, `lua.rs`, `acct:seq`/CAS, the `redis` dependency. **Cuebot** — `LettuceAccountingRedisPublisher` → `AccountingNotifier`: - Transactional `pg_notify` on proc release (same txn as `DELETE proc` → delivered iff it commits, a stronger model than the old afterCommit publish) and on the five enforced admin cap changes. - `accounting.redis.*` and the Lettuce dep removed; replaced by a safe `accounting.notify.enabled` kill-switch (off → scheduler degrades to recompute-only, which under-books, never over-books). The old over-booking startup guardrail is gone. **Docs** — `redis-accounting.md` → `scheduler-accounting.md` (full rewrite) plus `scheduler.md`, `deploying-scheduler.md`, stress-testing, and properties. - Caps are hard (license/OOM); every failure mode is safe-direction: a dropped NOTIFY leaves a counter reading high → under-book → healed by the next recompute. - N=1 is now an assumption of the in-memory design; multi-scheduler would need a shared store again (revisit trigger documented). - Rust: 177 lib tests pass (incl. straddle, managed-flip, dropped-NOTIFY invariants); clippy clean; stress suite compiles; bin builds. - Cuebot: `compileJava`/`compileTestJava`/`spotlessJavaCheck` pass (JDK 11). rollback).
Contributor
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The accounting subsystem coordinated Cuebot and the Rust scheduler through Redis to enable horizontal scaling across N scheduler instances. The scheduler is and will remain single-instance (N=1), so Redis's only unique benefit is unreachable while it manufactured an entire class of accounting-drift bugs (limit-seeding fail-closed, mass dispatch rejection, double-booking, CAS starvation). A single in-process counter is the source of truth that makes that bug class structurally impossible.
All shows were already drained to Cuebot-managed, so the cutover is clean (deploy into an idle role, then flip shows back one at a time via
b_scheduler_managed).Scheduler — Redis → in-memory
Store:accounting/store.rs: oneMutex, atomic check-and-increment across the three enforced vertices (subscription burst, folder/job max cores+gpus). Layer/point were incremented but never read, so they're dropped.LISTEN/NOTIFY(accounting/listener.rs):acct_releaseandacct_limit_change.SUM(proc)is the backstop (absolute overwrite, no CAS), carrying in-flight bookings forward via an epoch double-buffer so it can never erase a not-yet-snapshot-visible booking → never over-books a hard cap.redis_client.rs,lua.rs,acct:seq/CAS, theredisdependency.Cuebot —
LettuceAccountingRedisPublisher→AccountingNotifier:pg_notifyon proc release (same txn asDELETE proc→ delivered iff it commits, a stronger model than the old afterCommit publish) and on the five enforced admin cap changes.accounting.redis.*and the Lettuce dep removed; replaced by a safeaccounting.notify.enabledkill-switch (off → scheduler degrades to recompute-only, which under-books, never over-books). The old over-booking startup guardrail is gone.Docs —
redis-accounting.md→scheduler-accounting.md(full rewrite) plusscheduler.md,deploying-scheduler.md, stress-testing, and properties.Caps are hard (license/OOM); every failure mode is safe-direction: a dropped NOTIFY leaves a counter reading high → under-book → healed by the next recompute.
N=1 is now an assumption of the in-memory design; multi-scheduler would need a shared store again (revisit trigger documented).
Rust: 177 lib tests pass (incl. straddle, managed-flip, dropped-NOTIFY invariants); clippy clean; stress suite compiles; bin builds.
Cuebot:
compileJava/compileTestJava/spotlessJavaCheckpass (JDK 11). rollback).