Skip to content

Conversation

Copy link

Copilot AI commented Jan 4, 2026

The lock-free ThreadLocalStorage implementation using InterlockedCompareExchangePointer was identified as not entirely thread-safe. Replaced with lock-per-bucket approach using SRW locks.

Changes

  • Bucket structure: Each hash bucket now contains a wil::srwlock and Node* list head pointer
  • Read path: Acquires shared lock for concurrent traversal
  • Write path: Acquires exclusive lock with double-check to prevent race between read-lock release and write-lock acquisition
  • Test coverage: Added multi-threaded test (10 threads × 1000 iterations) validating concurrent access

Implementation

// Before: Lock-free with interlocked operations
for (auto pNode = m_hashArray[index]; pNode != nullptr; pNode = pNode->pNext) {
    if (pNode->threadId == threadId) return &pNode->value;
}
// Insert with InterlockedCompareExchangePointer...

// After: Lock-per-bucket with double-check
{
    auto lock = bucket.lock.lock_shared();
    // Search with shared lock...
}
// Allocate outside lock
auto lock = bucket.lock.lock_exclusive();
// Double-check for race, then insert

The shared lock overhead on the read path (~5-10ns) is negligible compared to thread safety benefits. Exclusive lock is acquired once per thread per bucket.

Original prompt

This section details on the original issue you should resolve

<issue_title>Revisit lock-free ThreadLocalStorage storage management</issue_title>
<issue_description>https://github.com/microsoft/wil/blob/master/include/wil/result.h#L403-L430

Internal analysis suggests that this sequence is not entirely thread safe. Some options include:

  • Use an srwlock-per-bucket (costs a function call)
  • Figure out the right sequence of acquire/release semantics for reading the hashtable bucket entries' starting pointer
  • Spin gently (using the platform's "yield" instructions as necessary)
  • Deeper analysis to prove safety with annotations and documentation

Thanks to @sudhakar-vp-zz and @cpkleynhans for the analysis! (This is http://task.ms/58950594 for future explorers.)</issue_description>

<agent_instructions>Read the code linked in the first comment, and replace its implementation with the code in the gist of the second comment. Use the existing type names (renaming the gist code as necessary). If there isn't already a test case validating this type, add one that uses multiple threads trying to access the data structure at once.</agent_instructions>

Comments on the Issue (you are @copilot in this section)

@jonwis See potential approach for "lock per bucket" here - https://gist.github.com/jonwis/f50d8f3d28a85a074cb74aa13d4c774c ... the additional AcquireSrwLockShared call to read the bucket field is "cheapish" and the "Exclusive" happens once per thread on its bucket.

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 3 commits January 4, 2026 00:38
…d multi-threaded test

Co-authored-by: jonwis <18537118+jonwis@users.noreply.github.com>
…g exclusive lock

Co-authored-by: jonwis <18537118+jonwis@users.noreply.github.com>
Co-authored-by: jonwis <18537118+jonwis@users.noreply.github.com>
Copilot AI changed the title [WIP] Revisit lock-free ThreadLocalStorage storage management Replace lock-free ThreadLocalStorage with SRW lock-per-bucket implementation Jan 4, 2026
Copilot AI requested a review from jonwis January 4, 2026 00:43
@jonwis jonwis marked this pull request as ready for review January 8, 2026 22:07
@jonwis jonwis requested review from dunhor and removed request for sudhakar-vp-zz January 9, 2026 04:18
// Note: Can return nullptr even when (shouldAllocate == true) upon allocation failure
T* GetLocal(bool shouldAllocate = false) WI_NOEXCEPT
{
// Get the current thread ID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a pretty unnecessary comment

ThreadLocalStorage& operator=(const ThreadLocalStorage&) = delete;

ThreadLocalStorage() = default;
~ThreadLocalStorage() WI_NOEXCEPT = default;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
~ThreadLocalStorage() WI_NOEXCEPT = default;
~ThreadLocalStorage() = default;

noexcept is the default

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol. Thanks Copilot. Teaches me to review closer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be fair, it was like this before, so not necessarily Copilot's fault (this time)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that there is /Zc:implicitNoexcept- which disables the implicit noexcept on destructors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If folks are disabling language features, this one instance isn't going to have much of a relative impact on them...

auto node = new (newNodeStore) Node{nullptr, threadId};

return &pNew->value;
// Look again and insert the new node
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's looking for a matching thread id... Unless we're worried about interrupts (in which case deadlock is now on the table), this additional check seems unnecessary since the same thread can't race against itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good point. It can just acquire the lock and shove it into the head.

@dunhor
Copy link
Member

dunhor commented Jan 10, 2026

Looking though, I'm now curious - what was the original issue? AFACIT values never get cleaned up and if everything gets inserted at the head, there doesn't appear to be any dangerous races (not that I disapprove of the change per-se; just trying to understand).

@jonwis
Copy link
Member

jonwis commented Jan 10, 2026

Looking though, I'm now curious - what was the original issue? AFACIT values never get cleaned up and if everything gets inserted at the head, there doesn't appear to be any dangerous races (not that I disapprove of the change per-se; just trying to understand).

Cc @sudhakarvp1 as well - this was found in internal code analysis. There were two issues found, one in the process-local storage and this (potential) one in ThreadLocalStorage. The request was to not rely on msvc-specific volatile behavior (see update of m_hashArray[index] to insert), and instead use something else.

This whole thing, however, is "just an interlocked SLIST" - I could switch to SLIST_ENTRY and InterlockedPush/PopSList instead and eliminate the code scanning warning.

The remaining fixes in the PR mentioned in issue (PR#13582094 and linked bug, which doesn't resolve through task.ms for some reason) still need to be ported over.

But, as you point out, it's not that much of an enhancement. What do you think?

@dunhor
Copy link
Member

dunhor commented Jan 10, 2026

The request was to not rely on msvc-specific volatile behavior

I'm curious to learn more about such behavioral differences

@jonwis
Copy link
Member

jonwis commented Jan 10, 2026

The request was to not rely on msvc-specific volatile behavior

I'm curious to learn more about such behavioral differences

The bug has links to code scanning data and "why to fix" notes.

On re-re-reading, the main issue appears to be missing the read-acquire semantics when starting the spin down the list (to fetch the current bucket list head). All pointers visible from that pointer should be stable and not modified until process teardown, but the initial read should probably needs read-acquire. Then the compare-exchange should remove the reinterpret_cast<> There was also some concern that threads walking the list would have rare but weird behavior, but I can't prove that to myself anymore; any pointer you read from the "head" and any pointer it points to are all validly allocated things that will live until the process terminates. The initial spin down the list should be "lock free," but it's missing the volatile-read-acquire semantics of the very first item in the list. This is where I wish we had a wistd::atomic<N*> so I could not think about it.

I'll take another run.

@dunhor
Copy link
Member

dunhor commented Jan 10, 2026

Is it fair to say that the issue is that the reads of the variables are not atomic? I did a quick search and in lieu of having access to std::atomic, one suggestion is to use auto val = InterlockedCompareExchangePointer(&data, nullptr, nullptr);

@jonwis
Copy link
Member

jonwis commented Jan 10, 2026

I spent a bunch of time reading up on the (undocumented, but in winnt.h - something we need to fix) ReadPointerAcquire and ReadPointerRaw which are just FORCEINLINE functions. ReadPointerAcquire is ReadAcquire which says "LONG temp; temp = *Source; return temp;" ReadPointerRaw is ReadRaw which says "LONG temp; temp = (LONG)Source; return temp;" ... so the "Raw" version discards the volatile qualifier before reading from the pointer.

So maybe the solution is to add our own "ReadPointerRaw/Acquire" that has the same thing? Something to noodle on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Revisit lock-free ThreadLocalStorage storage management

4 participants