Analyze and document libevwrapper atexit cleanup bug causing Python shutdown crashes #680

Copilot · 2026-01-31T22:18:44Z

Description

The atexit callback in libevreactor.py was registered with _global_loop=None at import time, causing it to receive None during shutdown instead of the actual loop instance. This prevented cleanup from running, leaving active libev callbacks that could execute during Python shutdown and crash.

This PR provides comprehensive analysis and test cases demonstrating the bug before implementing a fix.

Changes

LIBEV_SHUTDOWN_ANALYSIS.md: Comprehensive analysis document identifying the root cause and 6 different crash scenarios:
1. Race condition during thread join timeout
2. GIL state issues during interpreter finalization
3. Object lifecycle issues (callbacks accessing deallocated Python objects)
4. Connection cleanup not triggered (active connections with pending I/O)
5. Module deallocation order problems
6. Fork handling issues
tests/unit/io/test_libevreactor_shutdown.py: Test suite demonstrating the bug:
- test_atexit_callback_registered_with_none() - Shows atexit captures None instead of actual loop
- test_shutdown_crash_scenario_subprocess() - Subprocess test showing exact bug scenario
- test_callback_during_shutdown_scenario() - Documents crash conditions

Root Cause

Line 231 in libevreactor.py:

_global_loop = None
atexit.register(partial(_cleanup, _global_loop))  # Captures None!

The partial function captures None at import time. Later when _global_loop is set to a LibevLoop instance, the atexit callback still has None captured, so cleanup never runs.

Proposed Fix (for approval)

Replace the problematic line with a wrapper function that looks up the current _global_loop value at shutdown time:

def _atexit_cleanup():
    global _global_loop
    if _global_loop is not None:
        _cleanup(_global_loop)

_global_loop = None
atexit.register(_atexit_cleanup)  # Looks up current value at shutdown

This is a minimal 6-line change that ensures cleanup runs on the actual loop instance without requiring C extension modifications.

Pre-review checklist

I have split my patch into logically separate commits.
All commit messages clearly explain what they change and why.
I added relevant tests for new features and bug fixes.
All commits compile, pass static checks and pass test.
PR description sums up the changes and reasons why they should be introduced.
I have provided docstrings for the public items that I want to introduce.
I have adjusted the documentation in ./docs/source/.

Original prompt

This section details on the original issue you should resolve

<issue_title>Refactor libevwrapper cleanup situation</issue_title>
<issue_description>libevwrapper.py is using atexit module to do it's cleanup of global loop

from time to time we see the following crashes during python shutdown:

Artifact test crashes on python shutdown (libevwrapper) scylla-cluster-tests#11713

Sporadic SIGSEGV (exit code -11) in Python when exiting the Alternator test scylladb#17564 (suspect, since we don't collect there the dumps, to see it)

for years we had a workaround of some sort in SCT that was clearing the atexit hooks, blindly.
we removed it since it breaking other part that SCT or other part are using atexit.

and now we are getting this crash from time to time.

there two things that I think needs to improve:

ability to stop the libev loop, so we won't get into case it wakes up during python shutdown

refactor code to no use global loop and do cleanup directly without atexit

complete vibe code example of part one, it got compiled.
but didn't have a clear path (nor time) to test it
diff --git a/cassandra/io/libevwrapper.c b/cassandra/io/libevwrapper.c
index f32504fa3..0689bf3b9 100644
--- a/cassandra/io/libevwrapper.c
+++ b/cassandra/io/libevwrapper.c
@@ -6,6 +6,7 @@
 typedef struct libevwrapper_Loop {
     PyObject_HEAD
     struct ev_loop *loop;
+    ev_async async_watcher;
 } libevwrapper_Loop;

 static void
@@ -30,12 +31,24 @@ Loop_new(PyTypeObject *type, PyObject *args, PyObject *kwds) {
     return (PyObject *)self;
 };

+static void async_stop_cb(EV_P_ ev_async *w, int revents) {
+    ev_break(EV_A_ EVBREAK_ALL);
+}
+
+static PyObject *
+Loop_stop(libevwrapper_Loop *self, PyObject *args) {
+    ev_async_send(self->loop, &self->async_watcher);
+    Py_RETURN_NONE;
+}
+
 static int
 Loop_init(libevwrapper_Loop *self, PyObject *args, PyObject *kwds) {
     if (!PyArg_ParseTuple(args, "")) {
         PyErr_SetString(PyExc_TypeError, "Loop.__init__() takes no arguments");
         return -1;
     }
+    ev_async_init(&self->async_watcher, async_stop_cb);
+    ev_async_start(self->loop, &self->async_watcher);
     return 0;
 };

@@ -56,6 +69,7 @@ Loop_unref(libevwrapper_Loop *self, PyObject *args) {
 static PyMethodDef Loop_methods[] = {
     {"start", (PyCFunction)Loop_start, METH_NOARGS, "Start the event loop"},
     {"unref", (PyCFunction)Loop_unref, METH_NOARGS, "Unrefrence the event loop"},
+    {"stop", (PyCFunction)Loop_stop, METH_NOARGS, "Stop the event loop from any thread"},
     {NULL} /* Sentinel */
 };
</issue_description>

<agent_instructions>Please suggest a fix to clearing out the callback during shutdown, that doesn't include changing any 3rd party library, and wait to approval before implantation </agent_instructions>

Comments on the Issue (you are @copilot in this section)
@fruch NOTE: we are using libev backend across the board in testing, it's the most recommended and tested one from POV. this might be the cause of miss-using or leaving opening connection behind, but even if that happens we should prevent crashing python shutdown @fruch @absurdfarce are you familiar with such issues during the python shutdown ?
I did a quick search, and found any references </comment_new>
<comment_new>@fruch
@absurdfarce thanks, all fit with my suspicion that we have in flight request (in our tests that it happens, the Cassandra/scylla server is still up, when we shutdown the interpreter), and since we don't hold reference to the libev and stop the loop, it's callback can be called after/during shutdown.

Fixes Refactor libevwrapper cleanup situation #524

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

fruch · 2026-01-31T22:30:02Z

@copilot suggest more directions that might cause the issue.

Also write a test that show case the problem, before implantation of any fix

- Added LIBEV_SHUTDOWN_ANALYSIS.md with detailed root cause analysis - Documented 6 different crash scenarios that can occur - Added test_libevreactor_shutdown.py to demonstrate the bug - Tests show that atexit callback captures None instead of actual loop - Analysis explains why this causes crashes during Python shutdown Co-authored-by: fruch <340979+fruch@users.noreply.github.com>

Copilot · 2026-01-31T22:33:22Z

@copilot suggest more directions that might cause the issue.

Also write a test that show case the problem, before implantation of any fix

I've created a comprehensive analysis in commit f56442c:

Analysis Document: LIBEV_SHUTDOWN_ANALYSIS.md identifies 6 crash scenarios:

Race condition during thread join - Event loop thread keeps running after 1s timeout
GIL state issues - PyGILState_Ensure() may fail during interpreter finalization
Object lifecycle issues - Callbacks access deallocated Python objects (the main crash cause)
Connection cleanup not triggered - Active connections with pending I/O fire callbacks during shutdown
Module deallocation order - libev accesses modules (logging, os, threading) that may be torn down
Fork handling issues - Child process atexit may try to clean parent's loop state

Test File: tests/unit/io/test_libevreactor_shutdown.py demonstrates:

The atexit callback captures None instead of the actual loop instance
Cleanup receives None at shutdown and does nothing
A subprocess test showing the exact bug scenario

Root Cause: Line 231 in libevreactor.py:

atexit.register(partial(_cleanup, _global_loop))  # _global_loop is None here!

The partial captures None, so cleanup never runs, leaving callbacks active during Python shutdown.

Recommended Fix (minimal change):

def _atexit_cleanup():
    global _global_loop
    if _global_loop is not None:
        _cleanup(_global_loop)

atexit.register(_atexit_cleanup)

This ensures cleanup looks up the current _global_loop value at shutdown time.

Initial plan

6c57ce4

Copilot AI assigned Copilot and fruch Jan 31, 2026

Copilot started work on behalf of fruch January 31, 2026 22:19 View session

Copilot AI changed the title ~~[WIP] Refactor libevwrapper cleanup process to improve stability~~ Fix libevwrapper atexit cleanup to prevent crashes during Python shutdown Jan 31, 2026

Copilot AI requested a review from fruch January 31, 2026 22:20

Copilot finished work on behalf of fruch January 31, 2026 22:20

Copilot started work on behalf of fruch January 31, 2026 22:30 View session

Copilot AI changed the title ~~Fix libevwrapper atexit cleanup to prevent crashes during Python shutdown~~ Analyze and document libevwrapper atexit cleanup bug causing Python shutdown crashes Jan 31, 2026

Copilot finished work on behalf of fruch January 31, 2026 22:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analyze and document libevwrapper atexit cleanup bug causing Python shutdown crashes #680

Analyze and document libevwrapper atexit cleanup bug causing Python shutdown crashes #680

Copilot AI commented Jan 31, 2026 •

edited

Loading

Uh oh!

fruch commented Jan 31, 2026

Uh oh!

Copilot AI commented Jan 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Analyze and document libevwrapper atexit cleanup bug causing Python shutdown crashes #680

Are you sure you want to change the base?

Analyze and document libevwrapper atexit cleanup bug causing Python shutdown crashes #680

Conversation

Copilot AI commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Root Cause

Proposed Fix (for approval)

Pre-review checklist

Comments on the Issue (you are @copilot in this section)

Uh oh!

fruch commented Jan 31, 2026

Uh oh!

Copilot AI commented Jan 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Jan 31, 2026 •

edited

Loading