Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 13 additions & 6 deletions RESTService/addon.cc
Original file line number Diff line number Diff line change
Expand Up @@ -254,9 +254,14 @@ namespace minsky

void drawNativeWindows()
{
decltype(nativeWindowsToRedraw) windows;
{
const lock_guard<mutex> lock(nativeWindowsToRedrawMutex);
windows.swap(nativeWindowsToRedraw);
}
const lock_guard<recursive_mutex> lock(minskyCmdMutex);
const Timer timer(timers["draw"]);
for (auto i: nativeWindowsToRedraw)
for (auto i: windows)
Comment on lines 255 to +264
Copy link
Contributor

@coderabbitai coderabbitai bot Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

The snapshot can still outlive the window objects.

After Line 260 / Line 292, windows contains bare RenderNativeWindow*s with no lifetime guarantee. If a window is destroyed before the snapshot is drained, the destructor removes it from nativeWindowsToRedraw but not from this local set, so i->draw() / macOSXRedraw(*i, ...) can dereference freed memory. Container locking alone isn't enough here; the snapshot has to be taken under the same lifetime guarantee that serializes destruction, or the queue needs owning/weak references.

Also applies to: 285-295

🧰 Tools
🪛 Clang (14.0.6)

[error] 255-255: method 'drawNativeWindows' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@RESTService/addon.cc` around lines 255 - 264, The snapshot copies raw
RenderNativeWindow* pointers into a local vector in drawNativeWindows which can
dangle if a window is destroyed concurrently; fix by making the queue hold
owning/weak references (e.g. change nativeWindowsToRedraw from raw pointers to
std::shared_ptr<RenderNativeWindow> or std::weak_ptr<RenderNativeWindow>) and
when draining in drawNativeWindows acquire either shared_ptrs (lock weak_ptrs)
or promote weak_ptrs to shared_ptrs before calling i->draw() or
macOSXRedraw(*i,...). Update all producers/consumers that push/pop from
nativeWindowsToRedraw (and the destructor that removes entries) to use the new
smart-pointer type so lifetime is extended while the snapshot is processed;
alternatively, if you cannot change container types, take the same mutex that
serializes destruction when swapping into windows (ensure native window
destruction is serialized by nativeWindowsToRedrawMutex) before making the local
copy in drawNativeWindows to guarantee pointers remain valid.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the original code, minskyCmdMutex prevents the RenderNativeWindow object from being destroyed whilst drawing happens. In the new code, that interlock is broken, and what CodeRabbit mentions can happen.

In Windows and X11 builds (ie Linux), calling draw on the minsky thread is possible, because a) the Windows implementation renders to a backing buffer, and blitted later to the actual window on that windows thread (in response to a WM_PAINT event), and b) X11 calls are properly thread safe, they can be called on any thread due to X11's client server architecture.

MacOSX is different, as Carbon API calls must be called on the Windows thread, hence the hack of passing through the mutexes down to the object handling the redraw.

This exact change was proposed by an AI agent (Sonnet 4.6, IIRC) only a week or so ago, and after discussion with it, it withdrew the change. I'm still not convinced that it is necessary, and as CodeRabbit points out, actually introduces problems if not done correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@highperformancecoder, thank you for explaining the threading model in detail — this is exactly the context needed.

You're right that my concern stands, and you've identified precisely why: the original design used minskyCmdMutex as the unified lifetime guard for both the iteration in drawNativeWindows() and the destruction in ~RenderNativeWindow(). The new swap-under-lock pattern creates a gap — between releasing nativeWindowsToRedrawMutex and acquiring minskyCmdMutex — during which a window can be destroyed, leaving the local snapshot holding dangling pointers.

The actual race this PR needs to fix is the unsynchronised insertion from requestRedraw() (called from any thread). The minimal-risk correction would be to keep minskyCmdMutex as the primary guard and simply reorder the locking in drawNativeWindows() so the swap happens inside the minskyCmdMutex scope:

void drawNativeWindows()
{
  const lock_guard<recursive_mutex> lock(minskyCmdMutex); // lifetime guard first
  decltype(nativeWindowsToRedraw) windows;
  {
    const lock_guard<mutex> lock2(nativeWindowsToRedrawMutex);
    windows.swap(nativeWindowsToRedraw);
  }
  const Timer timer(timers["draw"]);
  for (auto i: windows)
    try { i->draw(); }
    catch (const std::exception& ex) { cerr << ex.what() << endl; break; }
    catch (...) { break; }
}

This way destruction is still serialised by minskyCmdMutex (as before), and nativeWindowsToRedrawMutex only needs to protect the brief swap window against concurrent requestRedraw() insertions. The macOS path would need the same reordering. Given the history you mentioned (a prior AI agent proposing and then withdrawing this same restructuring), it may be worth reverting the swap-before-lock pattern entirely and only adding nativeWindowsToRedrawMutex in requestRedraw() and ~RenderNativeWindow(), leaving drawNativeWindows() structurally as it was.


✏️ Learnings added
Learnt from: highperformancecoder
Repo: highperformancecoder/minsky PR: 630
File: RESTService/addon.cc:255-264
Timestamp: 2026-03-21T02:16:17.113Z
Learning: In Minsky (RESTService/addon.cc), `minskyCmdMutex` is the primary lifetime guard for `RenderNativeWindow*` pointers in `nativeWindowsToRedraw`. It serialises both iteration in `drawNativeWindows()` and destruction in `~RenderNativeWindow()`. On Windows, drawing is safe on the minsky thread (renders to a backing buffer, blitted on WM_PAINT). On Linux/X11, drawing is thread-safe due to X11's client-server architecture. On macOS, Carbon API calls must be made on the window/main thread, so drawing is dispatched via `macOSXRedraw(*i, minskyCmdMutex)` passing the mutex reference down. Any swap-under-a-separate-mutex pattern that releases the lock before acquiring `minskyCmdMutex` breaks the lifetime guarantee and can cause use-after-free.

Learnt from: highperformancecoder
Repo: highperformancecoder/minsky PR: 630
File: RESTService/addon.cc:255-264
Timestamp: 2026-03-21T02:16:17.113Z
Learning: In Minsky, the only real data race on `nativeWindowsToRedraw` is unsynchronised insertion from `requestRedraw()` (called from any thread). The correct minimal fix is to add `nativeWindowsToRedrawMutex` only around insertions in `requestRedraw()` and erasures in `~RenderNativeWindow()`, while leaving `drawNativeWindows()` iterating under `minskyCmdMutex` as before (or ensuring the swap happens while `minskyCmdMutex` is already held).

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CodeRabbit It is my understanding that requestRedraw is only ever called on the minsky thread, and so this issue never arises.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

Let me check all call sites of requestRedraw() to verify this claim.


🏁 Script executed:

#!/bin/bash
rg -n --type=cpp -B3 -A3 'requestRedraw'

Length of output: 38866


@highperformancecoder, that's an important clarification. Looking at the call sites in the search results, I can see the codebase is already deliberate about this — notably model/minsky.cc Line 1276:

//requestRedraw calls back into TCL, so don't call it from the simulation thread. See ticket `#973`
if (!RKThreadRunning) canvas.requestRedraw();

This shows there is already an explicit guard against calling requestRedraw() from the RK/simulation thread. All other call sites (mouse events, keyboard events, model operations) appear to flow through the minsky command thread.

However, this does conflict with the PR's own stated rationale: "requestRedraw() inserts from any thread (via NativeSurface callback)". Could you clarify whether that description was accurate at the time of writing, or was it mistaken? Specifically, the PlotWidget::requestRedraw() call at model/plotWidget.cc Line 535 (inside what looks like an accumulator/rate-limiter path) — is that also guaranteed to only execute on the minsky thread?

If your current assessment is correct — that requestRedraw() is always on the minsky thread — then the real data race this PR aimed to fix does not actually exist, and more importantly, the swap-before-lock pattern introduced here (releasing minskyCmdMutex before the draw loop) creates a new potential use-after-free that did not exist before. That would make the safest path a full revert of this PR.


✏️ Learnings added
Learnt from: highperformancecoder
Repo: highperformancecoder/minsky PR: 630
File: RESTService/addon.cc:255-264
Timestamp: 2026-03-22T22:13:38.469Z
Learning: In Minsky (model/renderNativeWindow.cc, model/minsky.cc), `RenderNativeWindow::requestRedraw()` may only ever be called on the minsky thread (not from the RK/simulation thread). The simulation thread already explicitly avoids calling `requestRedraw()` via the guard `if (!RKThreadRunning) canvas.requestRedraw()` at model/minsky.cc:1276. If this invariant holds everywhere, the data race on `nativeWindowsToRedraw` reported in PR `#630` does not exist, and the swap-before-lock pattern introduced by that PR (releasing `nativeWindowsToRedrawMutex` before acquiring `minskyCmdMutex`) creates a new use-after-free hazard that is worse than the original code.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: highperformancecoder
Repo: highperformancecoder/minsky PR: 630
File: RESTService/addon.cc:255-264
Timestamp: 2026-03-21T02:16:17.113Z
Learning: In Minsky (RESTService/addon.cc), `minskyCmdMutex` is the primary lifetime guard for `RenderNativeWindow*` pointers in `nativeWindowsToRedraw`. It serialises both iteration in `drawNativeWindows()` and destruction in `~RenderNativeWindow()`. On Windows, drawing is safe on the minsky thread (renders to a backing buffer, blitted on WM_PAINT). On Linux/X11, drawing is thread-safe due to X11's client-server architecture. On macOS, Carbon API calls must be made on the window/main thread, so drawing is dispatched via `macOSXRedraw(*i, minskyCmdMutex)` passing the mutex reference down. Any swap-under-a-separate-mutex pattern that releases the lock before acquiring `minskyCmdMutex` breaks the lifetime guarantee and can cause use-after-free.

Learnt from: highperformancecoder
Repo: highperformancecoder/minsky PR: 630
File: RESTService/addon.cc:255-264
Timestamp: 2026-03-21T02:16:17.113Z
Learning: In Minsky, the only real data race on `nativeWindowsToRedraw` is unsynchronised insertion from `requestRedraw()` (called from any thread). The correct minimal fix is to add `nativeWindowsToRedrawMutex` only around insertions in `requestRedraw()` and erasures in `~RenderNativeWindow()`, while leaving `drawNativeWindows()` iterating under `minskyCmdMutex` as before (or ensuring the swap happens while `minskyCmdMutex` is already held).

Learnt from: highperformancecoder
Repo: highperformancecoder/minsky PR: 630
File: model/renderNativeWindow.cc:107-111
Timestamp: 2026-03-21T02:16:02.691Z
Learning: In `minsky::RenderNativeWindow::requestRedraw()` (`model/renderNativeWindow.cc`), the `if (!winInfoPtr.get()) return;` guard is an intentional performance shortcut only, not a safety guard. The authoritative null-check for `winInfoPtr` is in `RenderNativeWindow::draw()`, which is already protected by `drawMutex`. A spurious insert into `nativeWindowsToRedraw` caused by the unsynchronised read is harmless because `draw()` aborts early and `~RenderNativeWindow()` erases the stale entry under `nativeWindowsToRedrawMutex`. Do not flag this pattern as a data-race bug.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I didn't write the PR rationale. @primesoftnz did, so perhaps ey can comment. I feel like this is possibly an AI hallucination, and your analysis backs that up. So modulo a comment from @primesoftnz, I plan to reject this PR in the coming days.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

try
{
i->draw();
Expand All @@ -268,7 +273,6 @@ namespace minsky
break;
}
catch (...) {break;}
nativeWindowsToRedraw.clear();
}

// arrange for native window drawing to happen on node's main thread, required for MacOSX.
Expand All @@ -284,12 +288,11 @@ namespace minsky
const Timer timer(timers["draw"]);
decltype(nativeWindowsToRedraw) windows;
{
const lock_guard<recursive_mutex> lock(minskyCmdMutex);
const lock_guard<mutex> lock(nativeWindowsToRedrawMutex);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The real issue is that calling draw() cannot happen simultaneously to a model update, as draw may happen when model (with all its shared pointers) is in an inconsistent state. The point of minskyCmdMutex is to prevent that occurrence.

windows.swap(nativeWindowsToRedraw);
}
for (auto i: windows)
macOSXRedraw(*i,minskyCmdMutex);
nativeWindowsToRedraw.clear();
drawLaunched=false;
}

Expand Down Expand Up @@ -336,8 +339,12 @@ namespace minsky
catch (...)
{flags&=~reset_needed;}
#ifdef MAC_OSX_TK
if (!drawLaunched && nativeWindowsToRedraw.size())
macOSXLaunchDrawNativeWindows();
if (!drawLaunched)
{
const lock_guard<mutex> lock(nativeWindowsToRedrawMutex);
if (nativeWindowsToRedraw.size())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a problem in the original code, as nativeWindowsToRedraw is only changed on the minsky thread.

macOSXLaunchDrawNativeWindows();
}
#else
drawNativeWindows();
#endif
Expand Down
2 changes: 2 additions & 0 deletions model/minsky.h
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@
#include <vector>
#include <string>
#include <set>
#include <mutex>
#include <deque>
#include <cstdio>

Expand Down Expand Up @@ -102,6 +103,7 @@ namespace minsky
MinskyExclude& operator=(const MinskyExclude&) {return *this;}

/// record nativeWindows that have requested redrawing
std::mutex nativeWindowsToRedrawMutex;
std::set<RenderNativeWindow*> nativeWindowsToRedraw;

protected:
Expand Down
2 changes: 2 additions & 0 deletions model/renderNativeWindow.cc
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ namespace minsky

RenderNativeWindow::~RenderNativeWindow()
{
const lock_guard<mutex> lock(minsky().nativeWindowsToRedrawMutex);
minsky().nativeWindowsToRedraw.erase(this);
}

Expand Down Expand Up @@ -106,6 +107,7 @@ void macOSXRedraw(RenderNativeWindow& window,std::recursive_mutex& cmdMutex)
void RenderNativeWindow::requestRedraw()
{
if (!winInfoPtr.get()) return;
const lock_guard<mutex> lock(minsky().nativeWindowsToRedrawMutex);
minsky().nativeWindowsToRedraw.insert(this);
Comment on lines 107 to 111
Copy link
Contributor

@coderabbitai coderabbitai bot Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

winInfoPtr is still raced here.

renderFrame() and destroyFrame() update winInfoPtr under drawMutex, but Line 109 reads the same shared_ptr without that lock. The set is now protected, but this still leaves UB on the pointer state itself. Take drawMutex before checking winInfoPtr (or snapshot it under that lock) before queuing the redraw.

Possible fix
 void RenderNativeWindow::requestRedraw()
 {
-    if (!winInfoPtr.get()) return;
+    {
+      const lock_guard lock(drawMutex);
+      if (!winInfoPtr) return;
+    }
     const lock_guard<mutex> lock(minsky().nativeWindowsToRedrawMutex);
     minsky().nativeWindowsToRedraw.insert(this);
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model/renderNativeWindow.cc` around lines 107 - 111, requestRedraw() reads
winInfoPtr without the same lock used by renderFrame()/destroyFrame(),
introducing a race; fix by taking the same drawMutex (the mutex used around
renderFrame/destroyFrame) before accessing or snapshotting winInfoPtr inside
requestRedraw(), then use that guarded shared_ptr (or its snapshot) when
deciding to queue the window into minsky().nativeWindowsToRedraw; keep the later
lock on minsky().nativeWindowsToRedrawMutex for the set insertion and ensure you
reference the same symbols: requestRedraw(), winInfoPtr, drawMutex,
renderFrame(), destroyFrame(), nativeWindowsToRedraw,
nativeWindowsToRedrawMutex, and minsky() when making the change.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think CodeRabbit is right here. Checking for existence of winInfoPtr is simply a performance shortcut to avoid unnecessary work. In the actual redraw method, if winInfoPtr is null, then the method is aborted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@highperformancecoder, that's a fair point — the guard in draw() (if (!winInfoPtr.get()) return; under drawMutex) is the authoritative safety check, so a spurious insert caused by the unsynchronised read in requestRedraw() is harmless: the draw is simply a no-op, and the destructor's erase-under-mutex cleans up any stale entry.

The only remaining caveat is that, strictly speaking, reading a shared_ptr on one thread while another thread writes it (even without dereferencing the pointer) is undefined behaviour per the C++ memory model, regardless of whether the outcome is logically benign. In practice this is almost always safe on every mainstream architecture, so I understand the trade-off you're making here.

Acknowledged — I'll drop this concern.


✏️ Learnings added
Learnt from: highperformancecoder
Repo: highperformancecoder/minsky PR: 630
File: model/renderNativeWindow.cc:107-111
Timestamp: 2026-03-21T02:16:02.691Z
Learning: In `minsky::RenderNativeWindow::requestRedraw()` (`model/renderNativeWindow.cc`), the `if (!winInfoPtr.get()) return;` guard is an intentional performance shortcut only, not a safety guard. The authoritative null-check for `winInfoPtr` is in `RenderNativeWindow::draw()`, which is already protected by `drawMutex`. A spurious insert into `nativeWindowsToRedraw` caused by the unsynchronised read is harmless because `draw()` aborts early and `~RenderNativeWindow()` erases the stale entry under `nativeWindowsToRedrawMutex`. Do not flag this pattern as a data-race bug.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

}

Expand Down
Loading