Fix data race on nativeWindowsToRedraw across threads by primesoftnz · Pull Request #630 · highperformancecoder/minsky

primesoftnz · 2026-03-20T07:44:49Z

Summary

nativeWindowsToRedraw (std::set<RenderNativeWindow*>) is accessed from multiple threads without synchronisation:

requestRedraw() inserts from any thread (via NativeSurface callback)
~RenderNativeWindow() erases during model operations
drawNativeWindows() iterates and clears on the minsky thread
macOSXLaunchDrawNativeWindows() reads .size() on the minsky thread

This is a data race on std::set that can corrupt the container's internal tree, potentially leading to crashes or infinite loops during iteration.

Relationship to tickets:#1906

Not certain this addresses Steve's reported freeze. The SetWindowPos removal in 3.25.1 likely addresses the primary freeze mechanism. This fix closes a correctness issue that could cause problems under load (complex models with many native windows creating tight timing between requestRedraw and drawNativeWindows), but may not be the root cause of the reported interface freeze.

Submitting separately as a correctness fix regardless.

Approach

Add a dedicated nativeWindowsToRedrawMutex (in MinskyExclude) to protect all accesses. The draw functions swap the set into a local copy under the new mutex, then release it before acquiring minskyCmdMutex — preserving existing lock scope and avoiding lock ordering issues.

This follows the swap-then-iterate pattern already used in macOSXDrawNativeWindows().

Lock ordering safety

nativeWindowsToRedrawMutex is never held simultaneously with minskyCmdMutex:

drawNativeWindows(): acquires nativeWindowsToRedrawMutex (swap), releases it, then acquires minskyCmdMutex (draw)
requestRedraw() / ~RenderNativeWindow(): acquire only nativeWindowsToRedrawMutex
If doCommand() triggers ~RenderNativeWindow() while holding minskyCmdMutex, the inner lock on nativeWindowsToRedrawMutex is safe (consistent minskyCmdMutex → nativeWindowsToRedrawMutex ordering)

Minor fix included

macOSXDrawNativeWindows() previously called nativeWindowsToRedraw.clear() after swapping the set out, which discarded any redraw requests made during the draw cycle. Removed — the swap already empties the original set.

Files changed

model/minsky.h — add std::mutex nativeWindowsToRedrawMutex to MinskyExclude
model/renderNativeWindow.cc — lock mutex in requestRedraw() and ~RenderNativeWindow()
RESTService/addon.cc — swap-under-lock in drawNativeWindows(), macOSXDrawNativeWindows(), and size check in macOSXLaunchDrawNativeWindows()

This change is

Summary by CodeRabbit

Bug Fixes
- Enhanced thread safety for native window redraw operations on macOS to prevent race conditions and improve rendering stability.

nativeWindowsToRedraw (std::set<RenderNativeWindow*>) is accessed from multiple threads without synchronisation: - requestRedraw() inserts from any thread (via NativeSurface callback) - ~RenderNativeWindow() erases during model operations - drawNativeWindows() iterates and clears on the minsky thread - macOSXLaunchDrawNativeWindows() reads .size() on the minsky thread This is a data race on std::set that can corrupt the container's internal tree, potentially leading to crashes or infinite loops during iteration. Whether this contributes to the freeze reported in tickets:#1906 is unclear — the SetWindowPos removal in 3.25.1 likely addresses the primary mechanism. This fix closes a correctness issue that could cause problems under load (complex models with many native windows), but may not be the root cause of Steve's reported freeze. Fix: add a dedicated mutex (nativeWindowsToRedrawMutex) to protect all accesses. The draw functions swap the set into a local copy under the new mutex, then release it before acquiring minskyCmdMutex — preserving existing lock scope and avoiding lock ordering issues. Also fixes a minor existing bug where macOSXDrawNativeWindows() cleared nativeWindowsToRedraw AFTER swapping it out, discarding any redraw requests made during the draw cycle. Related to tickets:#1906

coderabbitai · 2026-03-20T07:45:07Z

📝 Walkthrough

Walkthrough

The PR introduces mutex-protected access to the native window redraw queue. It adds a new std::mutex member to coordinate redraw requests, updates RenderNativeWindow to acquire the lock when managing redraw state, and refactors drawing logic to use atomic snapshots instead of direct shared container access.

Changes

Cohort / File(s)	Summary
Synchronization Infrastructure `model/minsky.h`	Added `nativeWindowsToRedrawMutex` member to `MinskyExclude` class and included `<mutex>` header for coordination of native window redraw requests.
Client-side Redraw Requests `model/renderNativeWindow.cc`	Updated destructor and `requestRedraw()` method to acquire `nativeWindowsToRedrawMutex` before modifying the shared `nativeWindowsToRedraw` container.
Drawing Logic & Launch Coordination `RESTService/addon.cc`	Refactored `drawNativeWindows()` to snapshot `nativeWindowsToRedraw` via mutex-protected swap into a local variable; updated `macOSXDrawNativeWindows()` to use the same pattern and removed explicit clear calls; modified main run loop to lock mutex before checking redraw queue size during draw launch trigger.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

#576: Modifies synchronization around native window redraws using mutexes to prevent deadlock conditions during redraw enqueuing and processing.

Suggested reviewers

highperformancecoder

Poem

🐰 A rabbit's ode to synchronized hops:

Locks protect our redraw queue,
No deadlock tangles, just locks that are true,
Snapshots swap safely, each draw gets its turn,
Synchronized windows—the lesson we learn! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and concisely captures the primary change: fixing a data race on the nativeWindowsToRedraw container accessed across multiple threads.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

You can make CodeRabbit's review stricter and more nitpicky using the `assertive` profile, if that's what you prefer.

Change the reviews.profile setting to assertive to make CodeRabbit's nitpick more issues in your PRs.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@model/renderNativeWindow.cc`:
- Around line 107-111: requestRedraw() reads winInfoPtr without the same lock
used by renderFrame()/destroyFrame(), introducing a race; fix by taking the same
drawMutex (the mutex used around renderFrame/destroyFrame) before accessing or
snapshotting winInfoPtr inside requestRedraw(), then use that guarded shared_ptr
(or its snapshot) when deciding to queue the window into
minsky().nativeWindowsToRedraw; keep the later lock on
minsky().nativeWindowsToRedrawMutex for the set insertion and ensure you
reference the same symbols: requestRedraw(), winInfoPtr, drawMutex,
renderFrame(), destroyFrame(), nativeWindowsToRedraw,
nativeWindowsToRedrawMutex, and minsky() when making the change.

In `@RESTService/addon.cc`:
- Around line 255-264: The snapshot copies raw RenderNativeWindow* pointers into
a local vector in drawNativeWindows which can dangle if a window is destroyed
concurrently; fix by making the queue hold owning/weak references (e.g. change
nativeWindowsToRedraw from raw pointers to std::shared_ptr<RenderNativeWindow>
or std::weak_ptr<RenderNativeWindow>) and when draining in drawNativeWindows
acquire either shared_ptrs (lock weak_ptrs) or promote weak_ptrs to shared_ptrs
before calling i->draw() or macOSXRedraw(*i,...). Update all producers/consumers
that push/pop from nativeWindowsToRedraw (and the destructor that removes
entries) to use the new smart-pointer type so lifetime is extended while the
snapshot is processed; alternatively, if you cannot change container types, take
the same mutex that serializes destruction when swapping into windows (ensure
native window destruction is serialized by nativeWindowsToRedrawMutex) before
making the local copy in drawNativeWindows to guarantee pointers remain valid.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 649cd5bc-1756-42b9-b8a8-ad424d3560b1

📥 Commits

Reviewing files that changed from the base of the PR and between 967bc63 and 46781ad.

📒 Files selected for processing (3)

RESTService/addon.cc
model/minsky.h
model/renderNativeWindow.cc

coderabbitai · 2026-03-20T07:53:28Z

model/renderNativeWindow.cc

  void RenderNativeWindow::requestRedraw()
  {
    if (!winInfoPtr.get()) return;
+    const lock_guard<mutex> lock(minsky().nativeWindowsToRedrawMutex);
    minsky().nativeWindowsToRedraw.insert(this);


⚠️ Potential issue | 🔴 Critical

winInfoPtr is still raced here.

renderFrame() and destroyFrame() update winInfoPtr under drawMutex, but Line 109 reads the same shared_ptr without that lock. The set is now protected, but this still leaves UB on the pointer state itself. Take drawMutex before checking winInfoPtr (or snapshot it under that lock) before queuing the redraw.

Possible fix

void RenderNativeWindow::requestRedraw() { - if (!winInfoPtr.get()) return; + { + const lock_guard lock(drawMutex); + if (!winInfoPtr) return; + } const lock_guard<mutex> lock(minsky().nativeWindowsToRedrawMutex); minsky().nativeWindowsToRedraw.insert(this); }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@model/renderNativeWindow.cc` around lines 107 - 111, requestRedraw() reads winInfoPtr without the same lock used by renderFrame()/destroyFrame(), introducing a race; fix by taking the same drawMutex (the mutex used around renderFrame/destroyFrame) before accessing or snapshotting winInfoPtr inside requestRedraw(), then use that guarded shared_ptr (or its snapshot) when deciding to queue the window into minsky().nativeWindowsToRedraw; keep the later lock on minsky().nativeWindowsToRedrawMutex for the set insertion and ensure you reference the same symbols: requestRedraw(), winInfoPtr, drawMutex, renderFrame(), destroyFrame(), nativeWindowsToRedraw, nativeWindowsToRedrawMutex, and minsky() when making the change.

I don't think CodeRabbit is right here. Checking for existence of winInfoPtr is simply a performance shortcut to avoid unnecessary work. In the actual redraw method, if winInfoPtr is null, then the method is aborted.

@highperformancecoder, that's a fair point — the guard in draw() (if (!winInfoPtr.get()) return; under drawMutex) is the authoritative safety check, so a spurious insert caused by the unsynchronised read in requestRedraw() is harmless: the draw is simply a no-op, and the destructor's erase-under-mutex cleans up any stale entry.

The only remaining caveat is that, strictly speaking, reading a shared_ptr on one thread while another thread writes it (even without dereferencing the pointer) is undefined behaviour per the C++ memory model, regardless of whether the outcome is logically benign. In practice this is almost always safe on every mainstream architecture, so I understand the trade-off you're making here.

Acknowledged — I'll drop this concern.

✏️ Learnings added

Learnt from: highperformancecoder Repo: highperformancecoder/minsky PR: 630 File: model/renderNativeWindow.cc:107-111 Timestamp: 2026-03-21T02:16:02.691Z Learning: In `minsky::RenderNativeWindow::requestRedraw()` (`model/renderNativeWindow.cc`), the `if (!winInfoPtr.get()) return;` guard is an intentional performance shortcut only, not a safety guard. The authoritative null-check for `winInfoPtr` is in `RenderNativeWindow::draw()`, which is already protected by `drawMutex`. A spurious insert into `nativeWindowsToRedraw` caused by the unsynchronised read is harmless because `draw()` aborts early and `~RenderNativeWindow()` erases the stale entry under `nativeWindowsToRedrawMutex`. Do not flag this pattern as a data-race bug.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

coderabbitai · 2026-03-20T07:53:28Z

RESTService/addon.cc

      void drawNativeWindows()
      {
+        decltype(nativeWindowsToRedraw) windows;
+        {
+          const lock_guard<mutex> lock(nativeWindowsToRedrawMutex);
+          windows.swap(nativeWindowsToRedraw);
+        }
        const lock_guard<recursive_mutex> lock(minskyCmdMutex);
        const Timer timer(timers["draw"]);
-        for (auto i: nativeWindowsToRedraw)
+        for (auto i: windows)


⚠️ Potential issue | 🔴 Critical

The snapshot can still outlive the window objects.

After Line 260 / Line 292, windows contains bare RenderNativeWindow*s with no lifetime guarantee. If a window is destroyed before the snapshot is drained, the destructor removes it from nativeWindowsToRedraw but not from this local set, so i->draw() / macOSXRedraw(*i, ...) can dereference freed memory. Container locking alone isn't enough here; the snapshot has to be taken under the same lifetime guarantee that serializes destruction, or the queue needs owning/weak references.

Also applies to: 285-295

🧰 Tools

🪛 Clang (14.0.6)

[error] 255-255: method 'drawNativeWindows' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@RESTService/addon.cc` around lines 255 - 264, The snapshot copies raw RenderNativeWindow* pointers into a local vector in drawNativeWindows which can dangle if a window is destroyed concurrently; fix by making the queue hold owning/weak references (e.g. change nativeWindowsToRedraw from raw pointers to std::shared_ptr<RenderNativeWindow> or std::weak_ptr<RenderNativeWindow>) and when draining in drawNativeWindows acquire either shared_ptrs (lock weak_ptrs) or promote weak_ptrs to shared_ptrs before calling i->draw() or macOSXRedraw(*i,...). Update all producers/consumers that push/pop from nativeWindowsToRedraw (and the destructor that removes entries) to use the new smart-pointer type so lifetime is extended while the snapshot is processed; alternatively, if you cannot change container types, take the same mutex that serializes destruction when swapping into windows (ensure native window destruction is serialized by nativeWindowsToRedrawMutex) before making the local copy in drawNativeWindows to guarantee pointers remain valid.

In the original code, minskyCmdMutex prevents the RenderNativeWindow object from being destroyed whilst drawing happens. In the new code, that interlock is broken, and what CodeRabbit mentions can happen.

In Windows and X11 builds (ie Linux), calling draw on the minsky thread is possible, because a) the Windows implementation renders to a backing buffer, and blitted later to the actual window on that windows thread (in response to a WM_PAINT event), and b) X11 calls are properly thread safe, they can be called on any thread due to X11's client server architecture.

MacOSX is different, as Carbon API calls must be called on the Windows thread, hence the hack of passing through the mutexes down to the object handling the redraw.

This exact change was proposed by an AI agent (Sonnet 4.6, IIRC) only a week or so ago, and after discussion with it, it withdrew the change. I'm still not convinced that it is necessary, and as CodeRabbit points out, actually introduces problems if not done correctly.

@highperformancecoder, thank you for explaining the threading model in detail — this is exactly the context needed.

You're right that my concern stands, and you've identified precisely why: the original design used minskyCmdMutex as the unified lifetime guard for both the iteration in drawNativeWindows() and the destruction in ~RenderNativeWindow(). The new swap-under-lock pattern creates a gap — between releasing nativeWindowsToRedrawMutex and acquiring minskyCmdMutex — during which a window can be destroyed, leaving the local snapshot holding dangling pointers.

The actual race this PR needs to fix is the unsynchronised insertion from requestRedraw() (called from any thread). The minimal-risk correction would be to keep minskyCmdMutex as the primary guard and simply reorder the locking in drawNativeWindows() so the swap happens inside the minskyCmdMutex scope:

void drawNativeWindows() { const lock_guard<recursive_mutex> lock(minskyCmdMutex); // lifetime guard first decltype(nativeWindowsToRedraw) windows; { const lock_guard<mutex> lock2(nativeWindowsToRedrawMutex); windows.swap(nativeWindowsToRedraw); } const Timer timer(timers["draw"]); for (auto i: windows) try { i->draw(); } catch (const std::exception& ex) { cerr << ex.what() << endl; break; } catch (...) { break; } }

This way destruction is still serialised by minskyCmdMutex (as before), and nativeWindowsToRedrawMutex only needs to protect the brief swap window against concurrent requestRedraw() insertions. The macOS path would need the same reordering. Given the history you mentioned (a prior AI agent proposing and then withdrawing this same restructuring), it may be worth reverting the swap-before-lock pattern entirely and only adding nativeWindowsToRedrawMutex in requestRedraw() and ~RenderNativeWindow(), leaving drawNativeWindows() structurally as it was.

✏️ Learnings added

Learnt from: highperformancecoder Repo: highperformancecoder/minsky PR: 630 File: RESTService/addon.cc:255-264 Timestamp: 2026-03-21T02:16:17.113Z Learning: In Minsky (RESTService/addon.cc), `minskyCmdMutex` is the primary lifetime guard for `RenderNativeWindow*` pointers in `nativeWindowsToRedraw`. It serialises both iteration in `drawNativeWindows()` and destruction in `~RenderNativeWindow()`. On Windows, drawing is safe on the minsky thread (renders to a backing buffer, blitted on WM_PAINT). On Linux/X11, drawing is thread-safe due to X11's client-server architecture. On macOS, Carbon API calls must be made on the window/main thread, so drawing is dispatched via `macOSXRedraw(*i, minskyCmdMutex)` passing the mutex reference down. Any swap-under-a-separate-mutex pattern that releases the lock before acquiring `minskyCmdMutex` breaks the lifetime guarantee and can cause use-after-free.

Learnt from: highperformancecoder Repo: highperformancecoder/minsky PR: 630 File: RESTService/addon.cc:255-264 Timestamp: 2026-03-21T02:16:17.113Z Learning: In Minsky, the only real data race on `nativeWindowsToRedraw` is unsynchronised insertion from `requestRedraw()` (called from any thread). The correct minimal fix is to add `nativeWindowsToRedrawMutex` only around insertions in `requestRedraw()` and erasures in `~RenderNativeWindow()`, while leaving `drawNativeWindows()` iterating under `minskyCmdMutex` as before (or ensuring the swap happens while `minskyCmdMutex` is already held).

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

@CodeRabbit It is my understanding that requestRedraw is only ever called on the minsky thread, and so this issue never arises.

🧩 Analysis chain

Let me check all call sites of requestRedraw() to verify this claim.

🏁 Script executed:

#!/bin/bash rg -n --type=cpp -B3 -A3 'requestRedraw'

Length of output: 38866

@highperformancecoder, that's an important clarification. Looking at the call sites in the search results, I can see the codebase is already deliberate about this — notably model/minsky.cc Line 1276:

//requestRedraw calls back into TCL, so don't call it from the simulation thread. See ticket `#973` if (!RKThreadRunning) canvas.requestRedraw();

This shows there is already an explicit guard against calling requestRedraw() from the RK/simulation thread. All other call sites (mouse events, keyboard events, model operations) appear to flow through the minsky command thread.

However, this does conflict with the PR's own stated rationale: "requestRedraw() inserts from any thread (via NativeSurface callback)". Could you clarify whether that description was accurate at the time of writing, or was it mistaken? Specifically, the PlotWidget::requestRedraw() call at model/plotWidget.cc Line 535 (inside what looks like an accumulator/rate-limiter path) — is that also guaranteed to only execute on the minsky thread?

If your current assessment is correct — that requestRedraw() is always on the minsky thread — then the real data race this PR aimed to fix does not actually exist, and more importantly, the swap-before-lock pattern introduced here (releasing minskyCmdMutex before the draw loop) creates a new potential use-after-free that did not exist before. That would make the safest path a full revert of this PR.

✏️ Learnings added

Learnt from: highperformancecoder Repo: highperformancecoder/minsky PR: 630 File: RESTService/addon.cc:255-264 Timestamp: 2026-03-22T22:13:38.469Z Learning: In Minsky (model/renderNativeWindow.cc, model/minsky.cc), `RenderNativeWindow::requestRedraw()` may only ever be called on the minsky thread (not from the RK/simulation thread). The simulation thread already explicitly avoids calling `requestRedraw()` via the guard `if (!RKThreadRunning) canvas.requestRedraw()` at model/minsky.cc:1276. If this invariant holds everywhere, the data race on `nativeWindowsToRedraw` reported in PR `#630` does not exist, and the swap-before-lock pattern introduced by that PR (releasing `nativeWindowsToRedrawMutex` before acquiring `minskyCmdMutex`) creates a new use-after-free hazard that is worse than the original code.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: highperformancecoder Repo: highperformancecoder/minsky PR: 630 File: RESTService/addon.cc:255-264 Timestamp: 2026-03-21T02:16:17.113Z Learning: In Minsky (RESTService/addon.cc), `minskyCmdMutex` is the primary lifetime guard for `RenderNativeWindow*` pointers in `nativeWindowsToRedraw`. It serialises both iteration in `drawNativeWindows()` and destruction in `~RenderNativeWindow()`. On Windows, drawing is safe on the minsky thread (renders to a backing buffer, blitted on WM_PAINT). On Linux/X11, drawing is thread-safe due to X11's client-server architecture. On macOS, Carbon API calls must be made on the window/main thread, so drawing is dispatched via `macOSXRedraw(*i, minskyCmdMutex)` passing the mutex reference down. Any swap-under-a-separate-mutex pattern that releases the lock before acquiring `minskyCmdMutex` breaks the lifetime guarantee and can cause use-after-free.

Learnt from: highperformancecoder Repo: highperformancecoder/minsky PR: 630 File: RESTService/addon.cc:255-264 Timestamp: 2026-03-21T02:16:17.113Z Learning: In Minsky, the only real data race on `nativeWindowsToRedraw` is unsynchronised insertion from `requestRedraw()` (called from any thread). The correct minimal fix is to add `nativeWindowsToRedrawMutex` only around insertions in `requestRedraw()` and erasures in `~RenderNativeWindow()`, while leaving `drawNativeWindows()` iterating under `minskyCmdMutex` as before (or ensuring the swap happens while `minskyCmdMutex` is already held).

Learnt from: highperformancecoder Repo: highperformancecoder/minsky PR: 630 File: model/renderNativeWindow.cc:107-111 Timestamp: 2026-03-21T02:16:02.691Z Learning: In `minsky::RenderNativeWindow::requestRedraw()` (`model/renderNativeWindow.cc`), the `if (!winInfoPtr.get()) return;` guard is an intentional performance shortcut only, not a safety guard. The authoritative null-check for `winInfoPtr` is in `RenderNativeWindow::draw()`, which is already protected by `drawMutex`. A spurious insert into `nativeWindowsToRedraw` caused by the unsynchronised read is harmless because `draw()` aborts early and `~RenderNativeWindow()` erases the stale entry under `nativeWindowsToRedrawMutex`. Do not flag this pattern as a data-race bug.

Well I didn't write the PR rationale. @primesoftnz did, so perhaps ey can comment. I feel like this is possibly an AI hallucination, and your analysis backs that up. So modulo a comment from @primesoftnz, I plan to reject this PR in the coming days.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

highperformancecoder

I'm not convinced that this change a) fixes anything, nor b) is implemented correctly.

This is unfortunately a very tricky bit of code, required because of requirements by MacOSX and Windows system of Windowing operations to be called from a specific thread. I'm not claiming what is there currently is ideal, but as far as I can tell it is correct.

highperformancecoder · 2026-03-21T01:54:19Z

model/renderNativeWindow.cc

  void RenderNativeWindow::requestRedraw()
  {
    if (!winInfoPtr.get()) return;
+    const lock_guard<mutex> lock(minsky().nativeWindowsToRedrawMutex);
    minsky().nativeWindowsToRedraw.insert(this);


I don't think CodeRabbit is right here. Checking for existence of winInfoPtr is simply a performance shortcut to avoid unnecessary work. In the actual redraw method, if winInfoPtr is null, then the method is aborted.

highperformancecoder · 2026-03-21T02:08:01Z

RESTService/addon.cc

      void drawNativeWindows()
      {
+        decltype(nativeWindowsToRedraw) windows;
+        {
+          const lock_guard<mutex> lock(nativeWindowsToRedrawMutex);
+          windows.swap(nativeWindowsToRedraw);
+        }
        const lock_guard<recursive_mutex> lock(minskyCmdMutex);
        const Timer timer(timers["draw"]);
-        for (auto i: nativeWindowsToRedraw)
+        for (auto i: windows)


In the original code, minskyCmdMutex prevents the RenderNativeWindow object from being destroyed whilst drawing happens. In the new code, that interlock is broken, and what CodeRabbit mentions can happen.

In Windows and X11 builds (ie Linux), calling draw on the minsky thread is possible, because a) the Windows implementation renders to a backing buffer, and blitted later to the actual window on that windows thread (in response to a WM_PAINT event), and b) X11 calls are properly thread safe, they can be called on any thread due to X11's client server architecture.

MacOSX is different, as Carbon API calls must be called on the Windows thread, hence the hack of passing through the mutexes down to the object handling the redraw.

This exact change was proposed by an AI agent (Sonnet 4.6, IIRC) only a week or so ago, and after discussion with it, it withdrew the change. I'm still not convinced that it is necessary, and as CodeRabbit points out, actually introduces problems if not done correctly.

highperformancecoder · 2026-03-21T02:10:00Z

RESTService/addon.cc

        decltype(nativeWindowsToRedraw) windows;
        {
-          const lock_guard<recursive_mutex> lock(minskyCmdMutex);
+          const lock_guard<mutex> lock(nativeWindowsToRedrawMutex);


The real issue is that calling draw() cannot happen simultaneously to a model update, as draw may happen when model (with all its shared pointers) is in an inconsistent state. The point of minskyCmdMutex is to prevent that occurrence.

highperformancecoder · 2026-03-21T02:12:06Z

RESTService/addon.cc

+                if (!drawLaunched)
+                  {
+                    const lock_guard<mutex> lock(nativeWindowsToRedrawMutex);
+                    if (nativeWindowsToRedraw.size())


This is not a problem in the original code, as nativeWindowsToRedraw is only changed on the minsky thread.

primesoftnz · 2026-03-22T23:51:04Z

Russell, you're right — I got the threading model wrong. requestRedraw() is minsky-thread only (the !RKThreadRunning guard at minsky.cc:1276 makes that explicit), so the data race I described doesn't exist. Worse, the swap-before-lock pattern I introduced breaks the minskyCmdMutex lifetime guarantee you rely on for safe iteration. Net negative.

Closing this. Thank you for the detailed walkthrough of the threading architecture — the Windows backing buffer / X11 client-server / macOS Carbon distinction is exactly the kind of context that's hard to derive from source alone, and I've learned a lot from this exchange.

One reflection from the process: the Ravel addon binary is where a lot of the concurrency complexity materialises, and without visibility into both sides of the minskyRESTService boundary, AI tooling is working with an incomplete picture of threading invariants. Both this PR and the earlier Sonnet 4.6 attempt hit the same wall. As AI-assisted contribution becomes more common, full source visibility will make a real difference to the quality of what gets submitted.

highperformancecoder · 2026-03-23T00:49:31Z

I am reminded of Brian Kernighan's aphorism: "Kernighan's Law - Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." I would love this stuff to be simplified, actually. Part of the problem was the code was originally written in a single threaded model, and later multithreaded to "prevent blocking the user interface" (JS frontend only, the TCL/Tk frontend was always single threaded). That then required modifications in the addon code, the interface with the windowing API (3 distinctly different approaches to that) and the simulation code (Runge-Kutta simulator) and callbacks into Node. It is hard to keep all these separate parts of the codebase in your mind (or an AI's mind) as context. One possible solution is to arrange callbacks into Node and let Node handle the renedering, (already deficiencies in the Cairo rendering library have been pointed out, particularly with the Pango font rendering library), but I strongly suspect we'll run into performance issues with that approach. Already using Cairo and direct to native windowing API, we are close to what performance level is acceptable. Food for thought.

…

On Sun, Mar 22, 2026 at 04:51:25PM -0700, primesoftnz wrote: ●primesoftnz left a comment (highperformancecoder/minsky#630) Russell, you're right — I got the threading model wrong. requestRedraw() is minsky-thread only (the !RKThreadRunning guard at minsky.cc:1276 makes that explicit), so the data race I described doesn't exist. Worse, the swap-before-lock pattern I introduced breaks the minskyCmdMutex lifetime guarantee you rely on for safe iteration. Net negative. Closing this. Thank you for the detailed walkthrough of the threading architecture — the Windows backing buffer / X11 client-server / macOS Carbon distinction is exactly the kind of context that's hard to derive from source alone, and I've learned a lot from this exchange. One reflection from the process: the Ravel addon binary is where a lot of the concurrency complexity materialises, and without visibility into both sides of the minskyRESTService boundary, AI tooling is working with an incomplete picture of threading invariants. Both this PR and the earlier Sonnet 4.6 attempt hit the same wall. As AI-assisted contribution becomes more common, full source visibility will make a real difference to the quality of what gets submitted. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.*Message ID: <highperformancecoder ***@***.***>

--

---------------------------------------------------------------------------- Dr Russell Standish Phone 0425 253119 (mobile) Principal, High Performance Coders ***@***.*** http://www.hpcoders.com.au ----------------------------------------------------------------------------

primesoftnz · 2026-03-23T01:39:46Z

The Kernighan quote is spot on. Worth noting — when we were investigating the freeze for this PR, the AI's first structural observation was that the threading complexity likely stemmed from a single-threaded architecture that was later retrofitted for multi-threading. Your confirmation of exactly that is both validating and a good illustration of where AI analysis adds value: pattern recognition across a large codebase, even when it lacks the full context to get the fix right.

Greg and I hope you'll join us tonight for the Skool demo — 6pm Sydney / 8pm Auckland. We're showing how AI-driven model building works with Ravel via the REST API, including the PlotWidget fix from PR #629. Would be great to have you there.

highperformancecoder · 2026-03-23T02:07:58Z

It's tonight? Sorry I'm otherwise engaged. I thought it was Tuesday night anyway, but unfortunately I'm otherwise engaged for that too. Longstanding commitments. Maybe Tues 31st if the session runs then?

…

On Sun, Mar 22, 2026 at 06:40:07PM -0700, primesoftnz wrote: ●primesoftnz left a comment (highperformancecoder/minsky#630) The Kernighan quote is spot on. Worth noting — when we were investigating the freeze for this PR, the AI's first structural observation was that the threading complexity likely stemmed from a single-threaded architecture that was later retrofitted for multi-threading. Your confirmation of exactly that is both validating and a good illustration of where AI analysis adds value: pattern recognition across a large codebase, even when it lacks the full context to get the fix right. Greg and I hope you'll join us tonight for the Skool demo — 6pm Sydney / 8pm Auckland. We're showing how AI-driven model building works with Ravel via the REST API, including the PlotWidget fix from PR #629. Would be great to have you there. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.*Message ID: <highperformancecoder ***@***.***>

--

---------------------------------------------------------------------------- Dr Russell Standish Phone 0425 253119 (mobile) Principal, High Performance Coders ***@***.*** http://www.hpcoders.com.au ----------------------------------------------------------------------------

primesoftnz · 2026-03-23T02:53:54Z

Opps sorry Russell, my mistake. Tomorrow night. , hope to see you there. From: Russell Standish ***@***.***> Sent: Monday, 23 March 2026 3:08 pm To: highperformancecoder/minsky ***@***.***> Cc: Greg Stevenson ***@***.***>; State change ***@***.***> Subject: Re: [highperformancecoder/minsky] Fix data race on nativeWindowsToRedraw across threads (PR #630) [https://avatars.githubusercontent.com/u/3075825?s=20&v=4]highperformancecoder left a comment (highperformancecoder/minsky#630)<#630 (comment)> It's tonight? Sorry I'm otherwise engaged. I thought it was Tuesday night anyway, but unfortunately I'm otherwise engaged for that too. Longstanding commitments. Maybe Tues 31st if the session runs then?

…

On Sun, Mar 22, 2026 at 06:40:07PM -0700, primesoftnz wrote: ●primesoftnz left a comment (highperformancecoder/minsky#630) The Kernighan quote is spot on. Worth noting — when we were investigating the freeze for this PR, the AI's first structural observation was that the threading complexity likely stemmed from a single-threaded architecture that was later retrofitted for multi-threading. Your confirmation of exactly that is both validating and a good illustration of where AI analysis adds value: pattern recognition across a large codebase, even when it lacks the full context to get the fix right. Greg and I hope you'll join us tonight for the Skool demo — 6pm Sydney / 8pm Auckland. We're showing how AI-driven model building works with Ravel via the REST API, including the PlotWidget fix from PR #629. Would be great to have you there. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.*Message ID: <highperformancecoder ***@***.***<mailto:***@***.***>>

--

---------------------------------------------------------------------------- Dr Russell Standish Phone 0425 253119 (mobile) Principal, High Performance Coders ***@***.***<mailto:***@***.***> http://www.hpcoders.com.au ---------------------------------------------------------------------------- — Reply to this email directly, view it on GitHub<#630?email_source=notifications&email_token=ADTN3MQQ5HKN5KQLTKNL2FD4SCMBHA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTIMJQG42TGOBUGE22M4TFMFZW63VMON2GC5DFL5RWQYLOM5S2KZLWMVXHJLDGN5XXIZLSL5RWY2LDNM#issuecomment-4107538415>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADTN3MTDN4G22XI4CIB6IRD4SCMBHAVCNFSM6AAAAACWY2OREGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DCMBXGUZTQNBRGU>. You are receiving this because you modified the open/close state.Message ID: ***@***.******@***.***>>

coderabbitai bot reviewed Mar 20, 2026

View reviewed changes

highperformancecoder requested changes Mar 21, 2026

View reviewed changes

primesoftnz closed this Mar 22, 2026

Uh oh!

Conversation

primesoftnz commented Mar 20, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Relationship to tickets:#1906

Approach

Lock ordering safety

Minor fix included

Files changed

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

highperformancecoder left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

primesoftnz commented Mar 22, 2026

Uh oh!

highperformancecoder commented Mar 23, 2026 via email

Uh oh!

primesoftnz commented Mar 23, 2026

Uh oh!

highperformancecoder commented Mar 23, 2026 via email

Uh oh!

primesoftnz commented Mar 23, 2026 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

primesoftnz commented Mar 20, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 20, 2026 •

edited

Loading

coderabbitai bot Mar 20, 2026 •

edited

Loading

coderabbitai bot Mar 20, 2026 •

edited

Loading