gh-135898: Add section to free-threading howto about memory usage (GH-143279)

nascheme · kumaraditya303 · miss-islington · commit 227155686cc8 · 2026-05-30T10:41:40.000Z
(cherry picked from commit 62a45fa) Co-authored-by: Neil Schemenauer <nas-github@arctrix.com> Co-authored-by: Kumar Aditya <kumaraditya@python.org>
diff --git a/Doc/howto/free-threading-python.rst b/Doc/howto/free-threading-python.rst
@@ -165,3 +165,132 @@ to false.  If the flag is true then the :class:`warnings.catch_warnings`
 context manager uses a context variable for warning filters.  If the flag is
 false then :class:`~warnings.catch_warnings` modifies the global filters list,
 which is not thread-safe.  See the :mod:`warnings` module for more details.
+
+
+Increased memory usage
+----------------------
+
+The free-threaded build will typically use more memory compared to the default
+build.  There are multiple reasons for this, mostly due to design decisions.
+
+
+All interned strings are immortal
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+For modern Python versions (since version 2.3), interning a string (e.g. with
+:func:`sys.intern`) does not cause it to become immortal.  Instead, if the last
+reference to that string disappears, it will be removed from the interned
+string table.  This is not the case for the free-threaded build and any interned
+string will become immortal, surviving until interpreter shutdown.
+
+
+Non-GC objects have a larger object header
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The free-threaded build uses a different :c:type:`PyObject` structure.  Instead
+of having the GC related information allocated before the :c:type:`PyObject`
+structure, like in the default build, the GC related info is part of the normal
+object header.  For example, on the AMD64 platform, ``None`` uses 32 bytes on
+the free-threaded build vs 16 bytes for the default build.  GC objects (such as
+dicts and lists) are the same size for both builds since the free-threaded
+build does not use additional space for the GC info.
+
+
+QSBR can delay freeing of memory
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+In order to safely implement lock-free data structures, a safe memory
+reclamation (SMR) scheme is used, known as quiescent state-based reclamation
+(QSBR).  This means that the memory backing data structures allowing lock-free
+access will use QSBR, which defers the free operation, rather than immediately
+freeing the memory.  Two examples of these data structures are the list object
+and the dictionary keys object.  See ``InternalDocs/qsbr.md`` in the CPython
+source tree for more details on how QSBR is implemented.  Running
+:func:`gc.collect` should cause all memory being held by QSBR to be actually
+freed.  Note that even when QSBR frees the memory, the underlying memory
+allocator may not immediately return that memory to the OS and so the resident
+set size (RSS) of the process might not decrease.
+
+
+mimalloc allocator vs pymalloc
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The default build will normally use the "pymalloc" memory allocator for small
+allocations (512 bytes or smaller).  The free-threaded build does not use
+pymalloc and allocates all Python objects using the "mimalloc" allocator.  The
+pymalloc allocator has the following properties that help keep memory usage
+low: small per-allocated-block overhead, effective memory fragmentation
+prevention, and quick return of free memory to the operating system.  The
+mimalloc allocator does quite well in these respects as well but can have some
+more overhead.
+
+In the free-threaded build, mimalloc manages memory in a number of separate
+heaps (currently four).  For example, all GC supporting objects are allocated
+from their own heap.  Using separate heaps means that free memory in one heap
+cannot be used for an allocation that uses another heap.  Also, some heaps are
+configured to use QSBR (quiescent-state based reclamation) when freeing the
+memory that backs up the heap (known as "pages" in mimalloc terminology).  The
+use of QSBR creates a delay between all memory blocks for a page being freed
+and the memory page being released, either for new allocations or back to the
+OS.
+
+The mimalloc allocator also defers returning freed memory back to the OS.  You
+can reduce that delay by setting the environment variable
+:envvar:`!MIMALLOC_PURGE_DELAY` to ``0``.  Note that this will likely reduce
+the performance of the allocator.
+
+
+Free-threaded reference counting can cause objects to live longer
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+In the default build, when an object's reference count reaches zero, it is
+normally deallocated.  The free-threaded build uses "biased reference
+counting", with a fast-path for objects "owned" by the current thread and a
+slow path for other objects.  See :pep:`703` for additional details.  Any time
+an object's reference count ends up in a "queued" state, deallocation can be
+deferred.  The queued state is cleared from the "eval breaker" section of the
+bytecode evaluator.
+
+The free-threaded build also allows a different mode of reference counting,
+known as "deferred reference counting".  This mode is enabled by setting a flag
+on a per-object basis.  Deferred reference counting is enabled for the
+following types:
+
+* module objects
+* module top-level functions
+* class methods defined in the class scope
+* descriptor objects
+* thread-local objects, created by :class:`threading.local`
+
+When deferred reference counting is enabled, references from Python function
+stacks are not added to the reference count.  This scheme reduces the overhead
+of reference counting, especially for objects used from multiple threads.
+Because the stack references are not counted, objects with deferred reference
+counting are not immediately freed when their internal reference count goes to
+zero.  Instead, they are examined by the next GC run and, if no stack
+references to them are found, they are freed.  This means these objects are
+freed by the GC and not when their reference count goes to zero, as is typical.
+
+
+Per-thread reference counting can delay freeing objects
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+To avoid contention on the reference count fields of frequently shared
+objects, the free-threaded build also uses "per-thread reference counting"
+for a few selected object types.  Rather than updating a single shared
+reference count, each thread maintains its own local reference count array,
+indexed by a unique id assigned to the object.  The true reference count is
+only computed by summing the per-thread counts when the object's local
+count drops to zero.  Per-thread reference counting is currently used for:
+
+* heap type objects (classes created in Python)
+* code objects
+* the ``__dict__`` of module objects
+
+Because the per-thread counts must be merged back to the object before it
+can be deallocated, objects using per-thread reference counting are
+typically freed later than they would be in the default build.  In
+particular, such an object is usually not freed until the thread that
+referenced it reaches a safe point (for example, in the "eval breaker"
+section of the bytecode evaluator) or exits.  Running :func:`gc.collect`
+will merge the per-thread counts and allow these objects to be freed.