Runtime optimizations for backlogger. Caching to optimize from O(n) to O(1). by enlorenz · Pull Request #23 · FairRootGroup/powersched

enlorenz · 2026-02-16T09:50:04Z

No description provided.

coderabbitai · 2026-02-16T09:53:30Z

No actionable comments were generated in the recent review. 🎉

📝 Walkthrough

Walkthrough

Adds cache-backed pending-job statistics and a mutation/versioning mechanism to avoid recomputation when queue/backlog are unchanged (environment.py). Adjusts plot filename tokens to include drop_weight and rename job_age token (plot.py). Handles all-fixed-weights case in weight combination generation (train_iter.py).

Changes

Cohort / File(s)	Summary
Job statistics & queue/backlog mutation `src/environment.py`	Adds `_queue_backlog_version`, `_cached_queue_backlog_version`, and `_mark_queue_backlog_mutation()`; tracks queue/backlog mutations in `step()`, fast-paths `_update_pending_job_stats()` when versions match; recomputes backlog metrics from `backlog_queue` and updates `self.state` when mutated.
Plot filename tokens `src/plot.py`	Changes saved filename prefix format: renames the job-age token from `d` to `a` and appends a new `d` token for `drop_weight` (string construction only).
Fixed-weights combination handling `train_iter.py`	Adds branch for zero variable-weights: if fixed weights sum ≈ 1.0, builds a single combination placing fixed values in their positions and appends it to combinations. Existing multi-variable branches unchanged.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Next empty slot bump #2 — Touches ComputeClusterEnv step() and job-queue operations; likely overlaps with the queue/backlog mutation/versioning changes here.
Rework of Rewards and Queueing + Small hotfixes #17 — Modifies src/environment.py queue/backlog handling and step() workflow; closely related to the pending-stats caching and mutation tracking changes.

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	No pull request description was provided by the author; the field is empty and therefore cannot be evaluated for relevance or clarity.	Add a pull request description explaining the purpose, scope, and impact of the runtime optimizations and caching changes for better context.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main optimization changes across multiple files: caching mechanism in environment.py to avoid O(n) recomputation, plot.py filename adjustments, and train_iter.py weight handling improvements.
Merge Conflict Detection	✅ Passed	✅ No merge conflicts detected when merging into `master`

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/plot.py (1)
272-272: ⚠️ Potential issue | 🟡 Minor

Inconsistent prefix format in plot_cumulative_savings — missing drop_weight and using old d label for job_age.

Line 94 was updated to e{eff}_p{price}_i{idle}_a{job_age}_d{drop}, but line 272 still uses the old format e{eff}_p{price}_i{idle}_d{job_age}, which:

Labels job_age_weight as d instead of a.

Omits drop_weight entirely.
Proposed fix
-        prefix = f"e{env.weights.efficiency_weight}_p{env.weights.price_weight}_i{env.weights.idle_weight}_d{env.weights.job_age_weight}"
+        prefix = f"e{env.weights.efficiency_weight}_p{env.weights.price_weight}_i{env.weights.idle_weight}_a{env.weights.job_age_weight}_d{env.weights.drop_weight}"

🤖 Fix all issues with AI agents

In `@src/environment.py`:
- Around line 215-220: The fast-path cache that returns early when
_cached_queue_count == queue_count and _cached_backlog_size ==
current_backlog_size is unsafe because equal counts can hide content changes;
replace this count-only invalidation with a robust versioning or hashing
approach: introduce a monotonically increasing mutation/version id (e.g.,
_queue_backlog_version) that is incremented on every queue or backlog mutation
(every place that modifies the queue/backlog such as enqueue/assign/refill
routines), store that version alongside _cached_* (e.g.,
_cached_queue_backlog_version) and only skip recalculation when the versions
match; alternatively compute and compare a lightweight content checksum of the
active queue/backlog before skipping. Ensure fields referenced in observations
(pending_core_hours, pending_avg_duration, pending_max_nodes) are recomputed
unless the version/hash indicates no mutation.

🧹 Nitpick comments (2)

src/environment.py (2)
216-217: Redundant hasattr guards — attributes are always initialized in _reset_timeline_state.

_cached_queue_count and _cached_backlog_size are initialized at lines 207–208 in _reset_timeline_state, which is called from __init__ before any step() can invoke _update_pending_job_stats. The hasattr checks add unnecessary overhead on every step.
Proposed simplification (assuming the cache logic itself is fixed)
-        if (hasattr(self, '_cached_queue_count') and 
-            hasattr(self, '_cached_backlog_size') and
-            self._cached_queue_count == queue_count and 
+        if (self._cached_queue_count == queue_count and 
             self._cached_backlog_size == current_backlog_size):
231-235: List comprehensions over deque are fine here but could use islice for very large backlogs.

These iterate the entire backlog deque three separate times to build three arrays. If the backlog grows large, a single-pass extraction would be more efficient:
Single-pass alternative
-            backlog_durations = np.array([job[0] for job in self.backlog_queue], dtype=np.int32)
-            backlog_nodes = np.array([job[2] for job in self.backlog_queue], dtype=np.int32)
-            backlog_cores = np.array([job[3] for job in self.backlog_queue], dtype=np.int32)
+            backlog_data = np.array([(job[0], job[2], job[3]) for job in self.backlog_queue], dtype=np.int32)
+            backlog_durations = backlog_data[:, 0]
+            backlog_nodes = backlog_data[:, 1]
+            backlog_cores = backlog_data[:, 2]

src/environment.py

Previously, _update_pending_job_stats() was converting the entire backlog deque to a list and then to a numpy array every single step, resulting in O(n) time complexity where n = backlog size. With unbounded backlog growth, this caused severe FPS degradation as training progressed. Changes: - Optimize deque-to-array conversion: extract columns directly without intermediate list() materialization - Add caching layer: skip recalculation when queue and backlog sizes are unchanged (O(1) cache hit on most steps) - Initialize cache state in _reset_timeline_state() to ensure first call always recalculates, then uses cached values This reduces _update_pending_job_stats() from O(n) to O(1) on stable steps, eliminating the training stalls observed with large backlogs. Plus: Small fix in plot.py, saved plots are now named correctly.

…ation

enlorenz marked this pull request as draft February 16, 2026 09:50

coderabbitai bot reviewed Feb 16, 2026

View reviewed changes

src/environment.py Outdated Show resolved Hide resolved

enlorenz added 2 commits February 16, 2026 11:29

train_iter.py: Added condition to allow for fixed total weight-combin…

7b36496

…ation

enlorenz force-pushed the wip/integration branch from 0338183 to 7b36496 Compare February 16, 2026 10:29

enlorenz marked this pull request as ready for review February 16, 2026 10:47

rbx approved these changes Feb 16, 2026

View reviewed changes

rbx merged commit 7c03550 into FairRootGroup:master Feb 16, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Runtime optimizations for backlogger. Caching to optimize from O(n) to O(1).#23

Runtime optimizations for backlogger. Caching to optimize from O(n) to O(1).#23
rbx merged 2 commits intoFairRootGroup:masterfrom
enlorenz:wip/integration

enlorenz commented Feb 16, 2026

Uh oh!

coderabbitai bot commented Feb 16, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

enlorenz commented Feb 16, 2026

Uh oh!

coderabbitai bot commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Feb 16, 2026 •

edited

Loading