Skip to content

fix: add life support to handles cast to string_view#6092

Open
charlesbeattie wants to merge 3 commits into
pybind:masterfrom
charlesbeattie:master
Open

fix: add life support to handles cast to string_view#6092
charlesbeattie wants to merge 3 commits into
pybind:masterfrom
charlesbeattie:master

Conversation

@charlesbeattie

Copy link
Copy Markdown

Description

std::string_view is a reference type and therefore the handles are required to be put on life-support.

void my_func(std::vector<std::string_view> arg);
my_func(str(i).encode() for i in range(1000,1010))

Suggested changelog entry:

Add life support to handles cast to string_view.

@henryiii

henryiii commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator

Looks fine to me. Ran a review with Claude Opus 4.8, below. Some All of the "Minor suggestions" are wrong - we don't expect a changelog fragment (I need to see if I can edit AGENTS.md to fix that), Test location and Duplication shouldn't be there. The handle load_src vs src might be valid - I don't really know why both exist here, TBH. Ah, Claude knows. It searched history, and found it's a Python 2 left-over, there used to be a branch where it was reassigned. Doesn't affect this PR, I might try to remove it afterwards.

🤖 AI review 🤖

I've reviewed PR #6092. Here's my assessment.

Overview

The PR fixes a genuine use-after-free bug when casting Python objects to std::string_view (e.g. std::vector<std::string_view>). A string_view is a non-owning reference, so the underlying Python buffer must be kept alive for the duration of the call. Previously, only the UTF-16/32 path (cast.h:557) registered the Python object with loader_life_support. The three other load paths left the view dangling:

  • UTF-8 path (cast.h:529) — PyUnicode_AsUTF8AndSize returns a buffer cached inside src, so src must be kept alive.
  • bytes path (cast.h:609) — view points into src's buffer.
  • bytearray path (cast.h:622) — same.

The fix adds loader_life_support::add_patient(...) to each, gated on IsView.

Correctness — looks right

  • Patient selection is correct in each path. UTF-8/bytes/bytearray all reference a buffer owned by src, so add_patient(src) is the right object — distinct from the UTF-16/32 path, which correctly patients the temporary utfNbytes because the view points into that intermediate, not src. The author got this distinction right.
  • No cost for std::string. IsView is a compile-time template constant, so if (IsView) is eliminated for the owning-string instantiation. Consistent with the existing line 557 (the codebase can't use if constexpr due to C++11 support).
  • add_patient is safe here. It requires an active loader_life_support frame (type_caster_base.h:86); during normal argument loading one always exists. This matches the long-standing UTF-16/32 behavior, so the new paths don't introduce a new failure mode for the supported use case.
  • Free-threading safeadd_patient mutates a thread-local frame.

Test coverage — good

The test exercises all three newly-fixed paths (str → UTF-8, bytes, bytearray) via generators, which is exactly the trigger: each generator item is a temporary that would be freed after casting without life support. Under ASAN (run in CI) this reliably catches the regression. The hasattr/PYBIND11_HAS_STRING_VIEW guards correctly handle toolchains without string_view.

Minor suggestions (non-blocking)

  • Duplication: the if (IsView) { loader_life_support::add_patient(...); } block now appears 4×, each with a different patient. Not worth a helper given the differing arguments — fine as-is.
  • load_src vs src: line 510 sets handle load_src = src; and the UTF-8 buffer is read from load_src.ptr(), but the patient is add_patient(src). They're identical here, so harmless; using load_src would read marginally more consistently. Trivial.
  • Changelog: the PR body has a suggested changelog entry but the diff doesn't touch any changelog file — confirm whether this project expects an inline docs/changelog update or harvests it from the PR.
  • Test location: the new assertions live inside test_vector(doc). A dedicated test_string_view_lifesupport would be more discoverable, but adding to an existing file/function aligns with the project's "prefer fewer test files" convention.

Risks

Low. The change is narrowly scoped, touches no ABI/internals, adds no overhead to the common std::string path, and only extends an already-established pattern to the paths that were missing it. No security concerns.

Verdict: Solid, correct bug fix with appropriate test coverage. Approve, modulo confirming the changelog expectation.

🤖 Why does like 510 exist? 🤖

Good catch — line 510 (handle load_src = src;) is vestigial today. It has no functional purpose in the current code.

What it does now

load_src is initialized to src and never reassigned anywhere in the function (lines 514, 515, 523, 536 all just read it). So it's an exact alias for src. Since handle is just a non-owning wrapper around a PyObject* — no refcounting, trivial copy — having the second handle costs nothing and changes nothing at runtime.

Why it exists

It's a leftover from Python 2 support. Before the Python 2 removal (commit 6493f496), the function had a branch that reassigned it:

// Python 2 era:
temp = reinterpret_steal<object>(PyUnicode_FromObject(load_src.ptr()));
...
load_src = temp;   // now load_src != src

When a Python 2 str/bytes was passed where unicode was needed, the code created a temporary unicode object and pointed load_src at it, so the downstream PyUnicode_* calls operated on the temp rather than the original src. The two-handle split was meaningful then. That reassignment branch was deleted with Python 2, but the now-redundant alias was never cleaned up.

Relevance to this PR

It's harmless but slightly inconsistent with the PR: the new UTF-8 path reads the buffer from load_src.ptr() (line 523) yet patients src (line 530). They're the same object, so it's correct — but it does mean the "keep alive the thing the buffer came from" relationship is expressed through two different names.

Cleanest fix (independent of this PR, or folded into it): drop line 510 and replace the four load_src uses with src. That's a pure simplification with zero behavioral change. I'd keep it out of this PR unless the author wants a tidy-up commit, since it's orthogonal to the lifetime fix.

@henryiii henryiii changed the title Add life support to handles cast to string_view. fix: add life support to handles cast to string_view Jun 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants