Skip to content

Add wait_event_timing: Oracle-style wait event instrumentation#1

Open
DmitryNFomin wants to merge 4 commits into
masterfrom
wait-event-timing
Open

Add wait_event_timing: Oracle-style wait event instrumentation#1
DmitryNFomin wants to merge 4 commits into
masterfrom
wait-event-timing

Conversation

@DmitryNFomin
Copy link
Copy Markdown
Owner

@DmitryNFomin DmitryNFomin commented Mar 22, 2026

Summary

This patch adds opt-in per-backend instrumentation that captures the duration of every wait event reported via pgstat_report_wait_start/end. Two capture levels are exposed: aggregated per-event histograms (stats) and a 10046-style per-session trace ring (trace).

The feature is gated by a configure-time flag (--enable-wait-event-timing / meson -Dwait_event_timing=true) and a runtime GUC (wait_event_capture, SUSET, default off). When the GUC is off, the inline wait-event hot path is one global load + one predicted-not-taken branch — overhead vs vanilla is within run-to-run noise.

Motivation

PostgreSQL surfaces a rich taxonomy of wait events through pg_stat_activity.wait_event, but only as instantaneous snapshots. There is no in-tree way to answer:

  • "How long do my backends actually spend in each wait?"
  • "What was the wait sequence of the last N events in this session?"
  • "Do these timings differ across two query plans?"

External tools either sample at coarse resolution (pg_wait_sampling, default 10 ms) or pay a heavy debug-trap cost when reading PGPROC->wait_event_info via a hardware watchpoint (~200–300 ns per transition → ~29 % TPS on high-transition workloads).

The closest upstream analogue is Oracle's wait event interface: a clock_gettime-grade timestamp at every wait start/end, with per-event aggregates and a per-session trace ring (the "10046" trace).

What this patch adds

GUCs

GUC type default summary
wait_event_capture SUSET, enum off off / stats / trace
wait_event_timing_max_tranches POSTMASTER, int 192 per-backend tranche slots for LWLock timing
wait_event_trace_ring_size_kb POSTMASTER, int 4096 per-backend trace ring size, power-of-2 in 8 ... 32768
  • off — feature inert at runtime.
  • stats — accumulates per-(backend, event) (count, total_ns, max_ns, log2-bucket histogram) in a per-backend DSA slot.
  • trace — additionally pushes each completed wait into a fixed-size DSA-backed ring buffer per session (lossless single-writer / lossless external reader; wraps when exhausted).

SQL surface

-- per-(backend, event) stats; NULL pid = all backends
SELECT * FROM pg_stat_get_wait_event_timing(NULL);
-- columns: pid, backend_type, procnumber, wait_event_type, wait_event,
--          calls, total_time_ms, avg_time_us, max_time_us, histogram int8[]

-- trace ring for the calling backend
SELECT * FROM pg_get_backend_wait_event_trace();

-- cross-backend trace read (procnumber, not pid; survives owner exit)
SELECT * FROM pg_get_wait_event_trace(<procnumber>);

-- reset
SELECT pg_stat_reset_wait_event_timing(<pid>);
SELECT pg_stat_reset_wait_event_timing_all();

-- one-shot sweep of rings whose owning backend died ungracefully
SELECT pg_stat_clear_orphaned_wait_event_rings();

Design notes

Off-mode codegen

pgstat_report_wait_start/end are static inline in src/include/utils/wait_event.h. The inline carries a minimal gate:

*(volatile uint32 *) my_wait_event_info = wait_event_info;
#ifdef USE_WAIT_EVENT_TIMING
    if (unlikely(wait_event_capture != WAIT_EVENT_CAPTURE_OFF))
        pgstat_report_wait_start_timing(wait_event_info);
#endif

The body (writes-disabled check, lazy DSA attach, INSTR_TIME_SET_CURRENT, current-event write) lives in out-of-line pgstat_report_wait_start_timing / pgstat_report_wait_end_timing. Keeping the gate body out-of-line prevents host functions (LWLockAcquire, LockAcquireExtended, XLogInsert, ...) from being inflated by the unreachable on-mode code, which would otherwise cost register spills and worse instruction layout across the binary (~2 % off-mode penalty on the inlined variant, eliminated by this layout).

Lazy DSA attach

Backends do not touch the timing DSA until the first time they observe wait_event_capture != OFF. The DSA itself is not created until the first backend that enables capture. Invariants:

  • MyProc->lwWaiting check at the start of lazy_attach() — never attempt DSA work while queued on an LWLock (DSA primitives can themselves queue on an LWLock, which would assert).
  • wait_event_timing_writes_disabled flag raised by a before_shmem_exit callback registered at first attach, so wait events dispatched during the proc_exit cascade (e.g. by dsm_backend_shutdown follow-ons) do not touch DSA mappings that have already been torn down.
  • wait_event_info is re-published after lazy_attach() returns — nested wait events during the attach (dsa_attach can contend on an internal LWLock) clear it.

Trace ring lifecycle

The per-backend trace ring is a DSA segment owned by a WaitEventTraceRingHeader. When a backend exits, the segment is left behind for an external reader. Orphaned rings are swept by:

  • the next backend that re-attaches with the same procnumber (resets the ring), or
  • bgwriter's periodic sweep, or
  • the explicit SQL function above.

The trace ring records a fixed-size struct per wait: (seq, wait_start_ns, wait_end_ns, wait_event_info, query_id).

Lock-free reads

Per-backend storage has a single writer (the owning backend). External readers (pg_stat_get_wait_event_timing(NULL), pg_get_wait_event_trace(procno)) read counters and the trace ring without locks; they tolerate torn reads by snapshotting (seq, wait_start, wait_end) and re-checking seq. Cross-backend reset uses an atomic reset_generation; the owning backend clears its own counters at its next wait_end — no shared mutex.

Measured overhead

CCX43 (16 dedicated AMD EPYC-Milan vCPU / 64 GB RAM), pgbench -S 16c/8j/30s, 4 GB shared_buffers, 5 runs per config, mean TPS:

BASELINE WET / off WET / stats WET / trace
pgbench -S 164,859 166,840 (+1.20 %) 164,126 (−0.44 %) 165,687 (+0.50 %)
pgbench TPC-B 17,569 17,487 (−0.46 %) 17,240 (−1.87 %) 17,314 (−1.45 %)

Off-mode is within noise. Stats / trace cost is below 2 % on typical pgbench and peaks at ~4–5 % under wait-event-saturated workloads (small shared_buffers with hot-row contention).

Full per-run numbers, eviction-stress with shared_buffers = 32 MB (W1–W4), stress regression (T1–T6, E1–E7), streaming replication with wait_event_capture = trace on both primary and standby, ASAN-instrumented battery, and make check-world results are in the latest test-results comment.

Commits

8b77f0f739  Enable --enable-wait-event-timing on one Linux CI task
94a8ef68cb  Add tests for wait_event_capture instrumentation
06e3033604  Add documentation for wait_event_capture instrumentation
98b9e93243  Add per-backend wait-event timing and tracing instrumentation

The main feature commit is large because the recording path, the SQL/SRF surface, the trace ring management, and the proc init hooks are tightly coupled; splitting further would not yield commits that build or pass tests independently.

Files changed

New:

  • src/backend/utils/activity/wait_event_timing.c — DSA management, lazy attach, recording, SQL functions, orphan sweep
  • src/include/utils/wait_event_timing.h — public structs, GUC externs

Modified:

  • src/include/utils/wait_event.h — out-of-line gate at pgstat_report_wait_start/end
  • src/backend/storage/lmgr/proc.c — init/cleanup of per-backend timing slot
  • src/backend/storage/ipc/ipci.c — DSA registration
  • src/backend/utils/activity/backend_status.c — query_id pointer hookup for trace records
  • src/include/catalog/pg_proc.dat — SQL function registration
  • src/backend/utils/misc/guc_tables.c, guc_parameters.dat, postgresql.conf.sample — GUC definitions
  • configure.ac, meson.build, meson_options.txt, src/include/pg_config.h.in — compile-time option
  • src/backend/utils/activity/Makefile, meson.build — build wiring
  • doc/src/sgml/monitoring.sgml, config.sgml — docs
  • src/test/regress/... — regression test additions

Open questions for review

  1. Configure flag vs always-on. Currently behind --enable-wait-event-timing (matching --enable-injection-points, --enable-cassert precedent). With the out-of-line gate the runtime off-mode cost is negligible; the compile flag could be dropped if there is consensus that the binary-size and feature-discoverability tradeoff is acceptable.

  2. Trace record size. 64 bytes per record is generous (current fields fit in ~48). Reducing to 32 doubles ring depth at the cost of dropping query_id from the trace record.

  3. Tranche hash sizing. wait_event_timing_max_tranches = 192 is a per-backend cap. Extensions that register many tranches can blow past this; today this raises a one-time WARNING and increments a per-backend overflow counter. An auto-grow path is possible but adds DSA allocation cost to a steady-state hot path.

  4. bgwriter sweep cadence. Orphan rings are swept once per bgwriter cycle today. A long-idle bgwriter delays cleanup; alternatively the sweep could be hooked into pg_stat_reset_* callers too.

Related work

DmitryNFomin added a commit to DmitryNFomin/pg_wait_tracer that referenced this pull request Mar 23, 2026
Add section to Performance documenting the proposed PostgreSQL patch
(DmitryNFomin/postgres#1) that adds Oracle-style internal wait event
instrumentation with near-zero overhead (<1%) vs hardware watchpoints
(6-29%).

Benchmark results on same environment (Hetzner cx43, 8 vCPU):
- TPC-B: <1% vs 6% (hardware watchpoint)
- SELECT-only worst case: <1% vs 29%

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@DmitryNFomin DmitryNFomin force-pushed the wait-event-timing branch 4 times, most recently from 7069864 to 7cadcee Compare March 24, 2026 10:14
DmitryNFomin pushed a commit that referenced this pull request Apr 13, 2026
…ocation

The WaitEventTraceState array was 4 MB per backend allocated unconditionally
at startup via ShmemInitStruct (400 MB at max_connections=100). Replace with
DSA-backed lazy allocation: a small WaitEventTraceControl struct (~8 bytes
per backend) lives in fixed shmem, and the 4 MB ring buffer is allocated
via dsa_create/dsa_allocate only when a backend SETs wait_event_trace = on.

Key changes:
- Add WaitEventTraceControl with dsa_handle, LWLock, per-backend dsa_pointers
- Add LWTRANCHE_WAIT_EVENT_TRACE_DSA for the control LWLock
- Lazy DSA creation on first use (any backend), attach on subsequent backends
- GUC assign hook (assign_wait_event_trace) triggers allocation on enable
- Cleanup via before_shmem_exit callback (runs before dsm_backend_shutdown
  detaches DSA segments, avoiding the use-after-detach crash that would
  occur if dsa_free ran from ProcKill's on_shmem_exit callback)
- Fix pg_stat_get_wait_event_trace proisstrict: was defaulting to true,
  causing NULL argument (own backend) to skip the C function entirely

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
DmitryNFomin pushed a commit that referenced this pull request Apr 17, 2026
Address all 20 issues identified by cross-checking Gemini 3.1, Opus 4.6,
and GPT 5.4 reviews against the actual PR code.  Each fix follows the
approved plan in crosscheck_review_fix_3.md.

Security / correctness fixes:

  Issue #1 (UAF in trace ring reads):
  - Remove pg_stat_get_wait_event_timing_by_query() entirely (OID 9951,
    view, docs, tests) -- cross-backend DSA ring scanning was the primary
    use-after-free surface.
  - Restrict pg_stat_get_wait_event_trace() to own-session only; the
    backend_id argument is accepted for forward compat but ignored.
  - Wrap DSA pointer mutations (attach/detach) in WaitEventTraceCtl->lock
    LW_EXCLUSIVE to protect external readers and fix 32-bit torn pointers
    (also resolves Issue #11).
  - Document the ring buffer as an external lock-free transport with a
    seqlock reader contract in wait_event_timing.h and monitoring.sgml.

  Issue #6 (cross-backend reset race):
  - Add per-backend LWLock to WaitEventTimingState, following the
    pgstat_reset_entry() pattern from pgstat_shmem.c.
  - Writers acquire LW_SHARED on the hot path; reset acquires LW_EXCLUSIVE.
  - Init and exit paths individually zero fields instead of memset on the
    whole struct, preserving the LWLock.
  - Register LWTRANCHE_WAIT_EVENT_TIMING in lwlocklist.h.

Build / startup fixes:

  Issue #2: Replace non-existent wait_classes.h include with
  wait_event_types.h in wait_event_timing.h.

  Issue #3 (config-file trace broken): Add explicit
  wait_event_trace_attach() calls after ResourceOwner init in
  auxprocess.c (after CreateAuxProcessResourceOwner) and postinit.c
  (after process_settings) to handle inherited GUC values that do not
  re-fire assign hooks after fork.

  Issue #10 (non-timing build silent suppression): Emit WARNING before
  forcing wait_event_timing/wait_event_trace to off in check hooks for
  non-timing builds (source < PGC_S_INTERACTIVE path).

  Issue #14 (EXEC_BACKEND init order): Add comment in
  pgstat_set_wait_event_timing_storage() confirming SubPostmasterMain
  calls CreateSharedMemoryAndSemaphores() before InitProcess().

Query lifecycle / phase separation:

  Issue #4 (QUERY_END broken for multi-statement):
  - Hybrid approach: emit QUERY_END for old query and QUERY_START for new
    query in pgstat_report_query_id() transitions (backend_status.c).
  - Keep existing QUERY_END in send_ready_for_query (postgres.c) for the
    last-query-to-idle boundary.
  - Covers simple single/multi-stmt, extended, prepared, and pipeline
    protocols without double-emit.

  Issue #5 (phase separation): Add TRACE_EXEC_END marker (type=4) emitted
  from ExecutorEnd() in execMain.c, separating execution from post-exec
  cleanup (commit, WAL flush, lock release).  Also resolves Issue #13
  (extended protocol phase misattribution).

Overflow and dependency warnings:

  Issue #7: One-time WARNING on first LWLock hash overflow (>192 tranches).
  Issue #8: WARNING in assign_wait_event_trace when track_activities is
  off; document the dependency in config.sgml.
  Issue #12: Add flat_overflow_count field + one-time WARNING when eventId
  exceeds per-class slot count.

Header cleanup and code generation:

  Issue #15: Move static const arrays and internal inline functions from
  wait_event_timing.h to wait_event_timing.c as file-static.  Convert
  trace marker inlines to extern functions with a shared
  wait_event_trace_write_marker() helper.

  Issue #16 (parallel array validation): Extend generate-wait_event_types.pl
  to emit wait_event_timing_data.c (mapping arrays) and the three sizing
  #defines into wait_event_types.h, generated from wait_event_names.txt.
  Eliminates hand-maintained parallel arrays entirely.  Update Makefile,
  meson.build, and pgindent exclusions.

Documentation:

  Issue #17: Document InitMaterializedSRF usage as acceptable for both
  the timing SRF (bounded result set) and trace SRF (own-session only,
  deferred max_records parameter).

  Issue #18: Document ring wrap limitation, polling contract, and
  st_query_id fallback in wait_event_timing.h and monitoring.sgml.

  Issue #19 nits: Remove duplicate header include, remove install/
  entries from .gitignore, add tranche_id 65535 academic comment, add
  eventId-equals-array-index comment, verify OID range (9950/9952/9953).

Testing:

  Issue #20:
  - Add trace lifecycle regression test (off/on/query/read/off cycle).
  - Add auxiliary process timing test (verify checkpointer/bgwriter/
    walwriter accumulate non-zero wait event counts).
  - Add test_lwlock_hash_overflow() to test_wait_event_stress module:
    registers >192 custom LWLock tranches to exercise hash overflow
    path and verify WARNING emission.
  - Document that default Cirrus CI already exercises non-timing build
    path via wait_event_timing_1.out alternate expected output.

Issues resolved by other fixes (no separate action needed):
  Issue #9: Resolved by Issue #1 (by_query view removed).
  Issue #11: Resolved by Issue #1C (LWLock around DSA pointer mutations).
  Issue #13: Resolved by Issue #5 (EXEC_START/EXEC_END markers).

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
@DmitryNFomin DmitryNFomin reopened this Apr 25, 2026
DmitryNFomin added a commit that referenced this pull request Apr 25, 2026
Four small, independently-justified fixes from the consolidated review
in review_5.md.  All verified by `make check` (249/249 passing,
including the wait_event_timing regression test).

#1  Stale docstring on pg_stat_reset_wait_event_timing
    (wait_event_timing.c)
    Two stacked docstrings: the first described an obsolete
    backend_id-based API ("0 = own, -1 = all") that does not exist;
    the function takes a PID.  Merge them into one block that uses
    the correct PID semantics and preserves the async-protocol +
    reset_count-polling guidance from the obsolete one.

#2  Spurious wait_event_* includes in ipci.c
    Two #includes (wait_event.h, wait_event_timing.h) referenced no
    symbols in ipci.c.  The WaitEvent*ShmemCallbacks externs come via
    storage/subsystems.h -> subsystemlist.h.  Removing both leaves
    ipci.c building cleanly and shrinks its include graph.

#3  Missing WARNING for wait_event_capture=trace + compute_query_id=off
    (wait_event_timing.c, config.sgml)
    assign_wait_event_capture already warns when track_activities is
    off; mirror the same warning for compute_query_id, since markers
    silently skip when query_id is 0.  Verified live: setting
    compute_query_id=off then wait_event_capture=trace now produces
    the expected WARNING + HINT pointing at pg_stat_statements as a
    common enabler.

#4  Trace doc only described QUERY markers; EXEC markers were silent
    (config.sgml, monitoring.sgml)
    The trace ring emits four marker types, not two.  ExecStart/
    ExecEnd bracket every executor invocation and are NOT gated on
    track_activities, while QueryStart/QueryEnd cover inter-statement
    and idle-transition boundaries that the executor cannot.  Restate
    the prereqs in config.sgml as an itemized list per family, update
    monitoring.sgml's view description and consumer guidance to name
    ExecStart as the primary attribution signal and QueryStart as the
    fallback for boundaries the executor cannot mark.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DmitryNFomin added a commit that referenced this pull request May 11, 2026
…reader

review_6.md raised the canonical -hackers objection against the
orphan-persistence lifecycle landed in commit 0dba096: the patch
preserves data (trace rings whose owning backend has exited) that
no in-tree code can actually read.  Without an in-tree consumer the
new orphan-memory cost is paid for hypothetical external workers,
the parallel-worker observability claim is unverifiable from the
patch, and review_6.md issue #8 (test coverage of the orphan path)
is structurally impossible because nothing in tree can exercise it.

This commit ships the in-tree consumer.

Two new SQL functions are added in src/backend/utils/activity/
wait_event_timing.c (oids 9962, 9963):

  pg_get_wait_event_trace(procnumber int4)
      Canonical cross-backend reader.  Reads slots in OWNED state
      (live writer) and ORPHANED state (writer has exited, ring
      preserved post-mortem) uniformly.  Returns empty for FREE
      slots, out-of-range procnumbers, or slots whose generation
      counter kept changing during the snapshot (very rare slot
      reassignment race -- bounded retry, then empty).

  pg_get_wait_event_trace_for_pid(pid int4)
      Convenience wrapper that resolves pid -> procNumber via
      BackendPidGetProc().  Cannot read ORPHANED slots by design:
      a dying backend's pid is removed from the proc array on
      exit, so the pid -> procNumber mapping does not survive
      backend exit.  Callers who want post-mortem readability
      use pg_get_wait_event_trace directly with procNumber.

Both functions implement the cross-backend reader protocol
documented on WaitEventTraceControl in
src/include/utils/wait_event_timing.h: snapshot the slot's
generation, read slot state atomically, acquire
WaitEventTraceCtl->lock in LW_SHARED, resolve ring_ptr,
memcpy the records[] array into a local palloc'd buffer
(~4 MB upper bound), release the lock, recheck generation,
retry on mismatch, then process records via the per-record
seqlock protocol off the lock.  Shared helper
emit_wait_event_trace_for_procnumber() factors the common
snapshot+emit code; the two SRFs are thin wrappers.

Privilege model: REVOKE EXECUTE from PUBLIC and GRANT to
pg_read_all_stats in system_views.sql, matching the privilege
model of the session-local view pg_backend_wait_event_trace
(established in review_5.md issue #27, commit 34015c1).
Wait-event traces are stats-class data and the symmetric
privilege story is the least surprising to monitoring tooling.

Test coverage:

  * src/test/regress/sql/wait_event_timing.sql: smoke tests
    that both SRFs are callable from SQL and produce sensible
    results for own session, unknown pid (empty), and
    out-of-range procnumbers (empty).  Both .out variants
    (timing and stub builds) updated.

  * src/test/modules/test_wait_event_stress/t/
    001_orphan_roundtrip.pl: end-to-end TAP test that
    exercises the orphan-readable code path.  A writer
    session enables trace, generates wait events, captures
    its procnumber, and disconnects; a separate long-lived
    reader session then calls pg_get_wait_event_trace on
    the writer's procnumber and asserts the records are
    visible after the writer has exited.  Verifies
    pg_stat_clear_orphaned_wait_event_rings() releases the
    orphan and that a subsequent read returns empty.

    The test keeps a dedicated reader BackgroundPsql open
    for the entire run; this prevents the reader's own
    procNumber from being a recycle target for the writer's
    procNumber, which would otherwise trigger
    wait_event_trace_clear_orphan_at_init() and reclaim the
    orphan before the test can read it.

    pg_sleep calls are wrapped in a DO block with PERFORM
    so they do not emit empty result rows that would
    contaminate the captured procnumber.  pg_stat_get_
    wait_event_timing exposes backend_id (procNumber + 1
    by PG convention), so the test subtracts 1 before
    passing to pg_get_wait_event_trace.

Documentation: new entries in monitoring.sgml under
monitoring-stats-funcs for both SRFs, plus a clarifying
update to the pg_get_backend_wait_event_trace entry that
points readers at the new cross-backend SRFs.

Catversion bump for the two new pg_proc OIDs.

Tested on Rocky Linux 9.7 + gcc 11.5 with
--enable-wait-event-timing --enable-debug --enable-cassert
--enable-tap-tests: full src/test/regress suite (249 tests)
passes, test_wait_event_stress regress passes, and the new
TAP test passes (6/6 subtests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DmitryNFomin added a commit that referenced this pull request May 12, 2026
…reader

review_6.md raised the canonical -hackers objection against the
orphan-persistence lifecycle landed in commit 0dba096: the patch
preserves data (trace rings whose owning backend has exited) that
no in-tree code can actually read.  Without an in-tree consumer the
new orphan-memory cost is paid for hypothetical external workers,
the parallel-worker observability claim is unverifiable, and
review_6.md issue #8 (test coverage of the orphan path) is
structurally impossible because nothing in tree can exercise it.

This commit ships the in-tree consumer.

  pg_get_wait_event_trace(procnumber int4)
      Cross-backend reader keyed by procnumber.  Reads slots in
      OWNED state (live writer) and ORPHANED state (writer has
      exited, ring preserved post-mortem) uniformly.  Returns
      empty for FREE slots, out-of-range procnumbers, or no
      records.  Implements the documented snapshot pattern:
      sample slot->generation, atomic state read, LW_SHARED on
      WaitEventTraceCtl->lock, dsa_get_address, palloc only the
      slice actually used by write_pos and memcpy under lock,
      release lock, re-sample generation, retry on mismatch
      (bounded by WET_READ_MAX_RETRIES=3, then NOTICE + empty so
      callers can distinguish a stable empty result from a lost
      race), then per-record seqlock processing off the lock.

  Privileges: REVOKE'd from PUBLIC and GRANT'ed to pg_read_all_stats
  in system_views.sql, matching the privilege model of the session-
  local view pg_backend_wait_event_trace.

Rename of the procnumber-exposing column on the existing SRFs:
  pg_stat_get_wait_event_timing and pg_stat_get_wait_event_timing_
  overflow both used to expose backend_id (procNumber + 1, 1-based
  legacy convention).  This patch renames the column to procnumber
  and makes it 0-based so the value is directly usable as the
  argument to pg_get_wait_event_trace().  The 0-based form matches
  modern PG conventions (post-PG-17 procnumber introduction).

Defensive fix in wait_event_trace_attach (review_6.md issue #5
collateral):
  The Assert(state_now != WET_TRACE_SLOT_ORPHANED) at attach time
  was too strict.  A backend can legitimately observe its own slot
  in the ORPHANED state if a later before_shmem_exit callback
  (pgstat_io_flush_cb during proc_exit shutdown) contends on an
  LWLock, which emits a wait event that lands in
  pgstat_report_wait_end_timing -> wait_event_trace_attach after
  wait_event_trace_before_shmem_exit has already run and
  transitioned the slot.  The Assert crashes parallel workers
  reliably on exit when capture=trace.  Replaced with defensive
  fall-through: if ORPHANED is observed at attach time, do not
  re-attach (the ring is now post-mortem; the writer invariant
  must hold).  This adds in-test coverage for the issue #5 case
  via the parallel-worker scenario below.

Test coverage:

  * src/test/regress/sql/wait_event_timing.sql
    - Smoke tests for pg_get_wait_event_trace: own session via
      procnumber-keyed path, out-of-range procnumbers (empty),
      capture=off case (empty).
    - Permission gating: a fresh non-pg_read_all_stats role
      attempting to call the function gets ERROR (caught via
      DO block with explicit insufficient_privilege handler).

  * src/test/modules/test_wait_event_stress/t/001_orphan_roundtrip.pl
    Two scenarios:
    (1) Plain-backend orphan roundtrip: writer enables trace,
        emits waits, captures its procnumber, disconnects;
        long-lived reader (separate session, pinning its own
        slot to prevent procnumber recycle of the writer's
        slot) reads the ORPHANED ring and asserts the events
        are present; pg_stat_clear_orphaned_wait_event_rings
        releases it; subsequent read empty.
    (2) Parallel-worker orphan roundtrip (the patch's stated
        motivation): a forced-parallel query (debug_parallel_
        query=on plus zero parallel costs) runs and exits;
        the leader and at least one worker each leave an
        ORPHANED slot.  The reader iterates procnumber 0..150
        and asserts at least two slots have readable records.

    Race-hardening: max_connections=100 in the test cluster
    plus a permanently-OWNED reader slot minimise the window
    in which an unrelated new backend could recycle a
    just-exited writer's procnumber and trigger
    wait_event_trace_clear_orphan_at_init().

Documentation: monitoring.sgml entry for pg_get_wait_event_trace
under monitoring-stats-funcs, including the SQL one-liner for
pid-keyed lookup against live backends, plus a clarifying
update to the pg_get_backend_wait_event_trace entry pointing
readers at the cross-backend SRF.

Catversion bump for the new pg_proc OID and the procnumber
column rename.

Tested on Rocky Linux 9.7 + gcc 11.5 with --enable-cassert and
--enable-tap-tests in both --enable-wait-event-timing and stub
build configurations: src/test/regress (249 tests) passes in
both, test_wait_event_stress regress and TAP (7 subtests) pass
in the timing build.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DmitryNFomin added a commit that referenced this pull request May 12, 2026
…_trace_attach

The Assert(state_now != WET_TRACE_SLOT_ORPHANED) inside
wait_event_trace_attach assumed that ORPHANED could never be
observed at attach time, on the reasoning that
pgstat_set_wait_event_timing_storage() at backend init calls
wait_event_trace_clear_orphan_at_init() which demotes any
inherited orphan to FREE before any wait-event capture path can
run.

That invariant holds at backend INIT.  It does NOT hold during
backend EXIT: after wait_event_trace_before_shmem_exit (registered
as a before_shmem_exit callback) has transitioned the slot to
ORPHANED, a LATER before_shmem_exit callback can contend on an
LWLock that emits a wait event.  In particular, pgstat_io_flush_cb
during proc_exit shutdown takes shared locks that may block under
load; LWLockAcquire's pgstat_report_wait_start/end machinery then
lands in pgstat_report_wait_end_timing -> wait_event_trace_attach
after my_wait_event_trace has been cleared.  The Assert fires and
the backend aborts.

This is reliably reproducible with parallel workers: a forced-
parallel query under wait_event_capture = trace produces multiple
workers that all exit at end-of-parallel-query, each of which hits
the Assert during shmem_exit.

Replace the Assert with a defensive runtime check: if state ==
ORPHANED at attach time, skip the re-attach.  We are dying, the
ring is now post-mortem data for cross-backend readers, and the
single-writer invariant must hold; the wait event that triggered
this attach path is not traced (a fully acceptable tradeoff -- we
are in the middle of shutting down).

review_6.md issue #5 asked for exactly this kind of defense-in-
depth at module boundaries; that review framed it as a style nit
("Assert is debug-only, defensive runtime check is better"), but
the parallel-worker shutdown sequence promotes it to a correctness
bug that crashes the backend in cassert builds.

Test coverage: the parallel-worker scenario in
src/test/modules/test_wait_event_stress/t/001_orphan_roundtrip.pl
(added in the follow-up commit for review_6.md issue #1)
exercises this exact shutdown path; the same TAP test crashed the
parallel worker before this fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DmitryNFomin added a commit that referenced this pull request May 12, 2026
…reader

review_6.md raised the canonical -hackers objection against the
orphan-persistence lifecycle landed in commit 0dba096: the patch
preserves data (trace rings whose owning backend has exited) that
no in-tree code can actually read.  Without an in-tree consumer the
new orphan-memory cost is paid for hypothetical external workers,
the parallel-worker observability claim is unverifiable, and
review_6.md issue #8 (test coverage of the orphan path) is
structurally impossible because nothing in tree can exercise it.

This commit ships the in-tree consumer.

  pg_get_wait_event_trace(procnumber int4)
      Cross-backend reader keyed by procnumber.  Reads slots in
      OWNED state (live writer) and ORPHANED state (writer has
      exited, ring preserved post-mortem) uniformly.  Returns
      empty for FREE slots, out-of-range procnumbers, or no
      records.  Implements the documented snapshot pattern:
      sample slot->generation, atomic state read, LW_SHARED on
      WaitEventTraceCtl->lock, dsa_get_address, palloc only the
      slice actually used by write_pos and memcpy under lock,
      release lock, re-sample generation, retry on mismatch
      (bounded by WET_READ_MAX_RETRIES = 3, then NOTICE + empty
      so callers can distinguish a stable empty result from a
      lost race), then per-record seqlock processing off the
      lock.

  Privileges: REVOKE'd from PUBLIC and GRANT'ed to pg_read_all_stats
  in system_views.sql, matching the privilege model of the session-
  local view pg_backend_wait_event_trace.

Rename of the procnumber-exposing column on the existing SRFs:
  pg_stat_get_wait_event_timing and pg_stat_get_wait_event_timing_
  overflow used to expose backend_id (procNumber + 1, 1-based
  legacy convention).  This patch renames the column to procnumber
  and makes it 0-based so the value is directly usable as the
  argument to pg_get_wait_event_trace().  The 0-based form matches
  modern PG conventions (post-PG-17 procnumber introduction).

Retry-loop buffer sizing:
  emit_wait_event_trace_for_procnumber pallocs local_records sized
  for the slice actually needed (min(write_pos, RING_SIZE)) rather
  than always 4 MB.  On retry, if a different owner has taken the
  slot with a larger write_pos, the buffer is pfree'd and
  re-allocated at the new size; the buffer is never shrunk across
  retries.  This avoids both the 4 MB-per-call cost on
  nearly-empty rings (the common case for short-lived backends
  and post-mortem orphan reads) and the buffer-overflow that
  would occur if we kept a too-small buffer across a
  size-growing retry.

Test coverage:

  * src/test/regress/sql/wait_event_timing.sql
    - Smoke tests for pg_get_wait_event_trace: own session via
      procnumber-keyed path, out-of-range procnumbers (empty),
      capture=off case (empty).
    - Permission gating: a fresh non-pg_read_all_stats role
      attempting to call the function gets ERROR (caught via
      DO block with an explicit insufficient_privilege handler).

  * src/test/modules/test_wait_event_stress/t/001_orphan_roundtrip.pl
    Two scenarios:
    (1) Plain-backend orphan roundtrip.  A writer enables
        trace, emits waits, captures its procnumber via the
        procnumber column of pg_stat_get_wait_event_timing,
        disconnects.  A long-lived reader session (pinning its
        own slot against procnumber recycle) reads the
        ORPHANED ring and asserts the events are present;
        pg_stat_clear_orphaned_wait_event_rings() releases
        the orphan; subsequent read returns empty.
    (2) Parallel-worker orphan roundtrip (the patch's stated
        motivation).  A forced-parallel query
        (debug_parallel_query=on plus zero parallel costs)
        runs and exits; the leader and at least one worker
        each leave an ORPHANED slot.  We count them via the
        admin sweep -- cheaper and more explicit than
        iterating procnumbers and pg_get_wait_event_trace'ing
        each one (the read path itself is already covered by
        scenario 1).

    Race-hardening: max_connections=100 in the test cluster
    plus a permanently-OWNED reader slot minimise the window
    in which an unrelated new backend could recycle a
    just-exited writer's procnumber and trigger
    wait_event_trace_clear_orphan_at_init().

Documentation: monitoring.sgml entry for pg_get_wait_event_trace
under monitoring-stats-funcs, including the SQL one-liner for
pid-keyed lookup against live backends, plus a clarifying update
to the pg_get_backend_wait_event_trace entry pointing readers at
the cross-backend SRF.

Catversion bump for the new pg_proc OID and the procnumber column
rename.

Tested on Rocky Linux 9.7 + gcc 11.5 with --enable-cassert and
--enable-tap-tests in both --enable-wait-event-timing and stub
build configurations: src/test/regress (249 tests) passes in both,
test_wait_event_stress regress and TAP (7 subtests) pass in the
timing build.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DmitryNFomin added a commit that referenced this pull request May 12, 2026
…reader

review_6.md raised the canonical -hackers objection against the
orphan-persistence lifecycle landed in commit 0dba096: the patch
preserves data (trace rings whose owning backend has exited) that
no in-tree code can actually read.  Without an in-tree consumer the
new orphan-memory cost is paid for hypothetical external workers,
the parallel-worker observability claim is unverifiable, and
review_6.md issue #8 (test coverage of the orphan path) is
structurally impossible because nothing in tree can exercise it.

This commit ships the in-tree consumer.

  pg_get_wait_event_trace(procnumber int4)
      Cross-backend reader keyed by procnumber.  Reads slots in
      OWNED state (live writer) and ORPHANED state (writer has
      exited, ring preserved post-mortem) uniformly.  Returns
      empty for FREE slots, out-of-range procnumbers, or no
      records.

  Privileges: REVOKE'd from PUBLIC and GRANT'ed to pg_read_all_stats
  in system_views.sql, matching the privilege model of the session-
  local view pg_backend_wait_event_trace.

Cross-backend reader protocol implemented in
emit_wait_event_trace_for_procnumber():

  1. Cheap unlocked state probe; FREE -> empty result.
  2. Acquire WaitEventTraceCtl->lock in LW_SHARED.  All slot
     transitions take LW_EXCLUSIVE, so the slot's identity,
     state, and ring_ptr are stable for the duration of the
     iteration (the generation-counter retry that external
     readers would do becomes unnecessary for this in-tree
     caller).
  3. Resolve ring_ptr via dsa_get_address, read write_pos.
  4. Iterate every live ring index, applying the per-record
     seqlock protocol AGAINST SHARED MEMORY: read seq_before,
     barrier, structure-copy the record into a local result
     buffer, barrier, read seq_after.  Skip records with odd
     seq_before (writer mid-record at start of read) or with
     seq_before != seq_after (writer completed a write cycle
     during our read; payload may be torn).
  5. Release the lock.
  6. Walk the local result buffer and emit rows into the
     tuplestore -- the only potentially I/O-expensive part,
     deliberately done outside the lock so spills do not
     extend lock-hold.

Why per-record seqlock against shared memory (not against a
memcpy'd local copy of the ring): the protocol requires
seq_before and seq_after to read THE SAME memory location at TWO
DIFFERENT TIMES, with the payload read in between.  A single
up-front memcpy followed by two reads of seq from the local copy
sees the value frozen at copy time, the check degenerates to a
no-op, and torn reads from concurrent writers cannot be detected.
For an own-session reader (pg_get_backend_wait_event_trace) the
concurrent-writer case does not arise; the cross-backend reader
must do the protocol correctly to handle live OWNED slots safely.

Lock-hold is O(records_in_range) in the iterator (similar wall
time to a single 4 MB memcpy of the full ring, ~1 ms on modern
hardware) with no I/O; the tuplestore work runs after release.

Rename of the procnumber-exposing column on the existing SRFs:
  pg_stat_get_wait_event_timing and pg_stat_get_wait_event_timing_
  overflow used to expose backend_id (procNumber + 1, 1-based
  legacy convention).  This patch renames the column to procnumber
  and makes it 0-based so the value is directly usable as the
  argument to pg_get_wait_event_trace().  The 0-based form matches
  modern PG conventions.

Test coverage:

  * src/test/regress/sql/wait_event_timing.sql
    - Smoke tests for pg_get_wait_event_trace: own session via
      procnumber-keyed path, out-of-range procnumbers (empty),
      capture=off case (empty).
    - Permission gating: a fresh non-pg_read_all_stats role
      attempting to call the function gets ERROR (caught via
      a DO block with an explicit insufficient_privilege
      handler).

  * src/test/modules/test_wait_event_stress/t/001_orphan_roundtrip.pl
    Three scenarios:
    (1) Plain-backend orphan roundtrip.  Writer enables trace,
        emits waits, captures its procnumber from the
        procnumber column of pg_stat_get_wait_event_timing,
        disconnects.  A long-lived reader (pinning its own
        slot against procnumber recycle) reads the ORPHANED
        ring and asserts the events are present;
        pg_stat_clear_orphaned_wait_event_rings releases the
        orphan; subsequent read returns empty.
    (2) Parallel-worker orphan roundtrip (the patch's stated
        motivation).  A forced-parallel query
        (debug_parallel_query=on plus zero parallel costs)
        runs and exits; leader and at least one worker each
        leave an ORPHANED slot, verified via the admin sweep
        count.
    (3) OWNED-slot read with a concurrent live writer.  A
        long-lived writer emits a steady stream of PgSleep
        events in a 1000-iteration loop while the reader
        calls pg_get_wait_event_trace(writer_procnumber)
        repeatedly.  Every observed row must be well-formed
        (no NULL/empty wait_event_type or wait_event, no
        negative duration); a single malformed row would
        indicate the seqlock failed to detect a torn write.
        This is the scenario the per-record seqlock protocol
        was rewritten to handle correctly.

Documentation: monitoring.sgml entry for pg_get_wait_event_trace
under monitoring-stats-funcs, including the SQL one-liner for
pid-keyed lookup against live backends, plus a clarifying update
to the pg_get_backend_wait_event_trace entry pointing readers at
the cross-backend SRF.

Catversion bump for the new pg_proc OID and the procnumber column
rename.

Tested on Rocky Linux 9.7 + gcc 11.5 with --enable-cassert and
--enable-tap-tests in both --enable-wait-event-timing and stub
build configurations: src/test/regress (249 tests) passes in both,
test_wait_event_stress regress and TAP (10 subtests) pass in the
timing build.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DmitryNFomin added a commit that referenced this pull request May 12, 2026
pg_atomic_write_u64 has no barrier semantics (see the header
comment in atomics.h: "No barrier semantics."), so on weak-memory
architectures (ARM, POWER) the writer's pg_atomic_write_u64 to
write_pos can be reordered after the subsequent plain stores to
rec->seq.  A cross-backend reader observing the new write_pos
would then iterate i = pos and read rec[pos & MASK].seq while it
still holds the PREVIOUS ring cycle's even value.  The reader's
seqlock check would falsely pass (the seq is stable across the
read because the writer hasn't yet bumped it to odd from this
cycle's perspective) and the reader would emit a stale-but-valid
record from the previous ring cycle, mislabeled with the new
ring index.

On x86 this is masked by TSO; on ARM/POWER it is a real data-
visibility bug that manifests as the cross-backend reader
returning rows tied to the wrong ring positions.

Add pg_write_barrier between the pg_atomic_write_u64(&write_pos,
pos+1) and the first rec->seq = odd store at both writer sites
(the wait-event hot path and the marker writer).  This is the
release half of an acquire/release pair; the matching
pg_read_barrier on the reader side lives in
emit_wait_event_trace_for_procnumber where the cross-backend
SRF reads write_pos before iterating the per-record seq fields.

This barrier is on the per-event hot path but adds only a CPU
barrier (no atomic RMW, no cache-line trip), so the cost is
within the noise of the existing seqlock writes already done by
the function.

No new tests: this is a weak-memory ordering fix that x86-only
testing cannot exhibit.  Catching it would require ARM/POWER
stress testing of the cross-backend reader against an actively
writing backend (the TAP scenario "OWNED-slot read with a
concurrent live writer" added in the follow-up commit for
review_6.md issue #1 is the closest thing in tree).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DmitryNFomin added a commit that referenced this pull request May 12, 2026
…reader

review_6.md raised the canonical -hackers objection against the
orphan-persistence lifecycle landed in commit 0dba096: the patch
preserves data (trace rings whose owning backend has exited) that
no in-tree code can actually read.  Without an in-tree consumer the
new orphan-memory cost is paid for hypothetical external workers,
the parallel-worker observability claim is unverifiable, and
review_6.md issue #8 (test coverage of the orphan path) is
structurally impossible because nothing in tree can exercise it.

This commit ships the in-tree consumer.

  pg_get_wait_event_trace(procnumber int4)
      Cross-backend reader keyed by procnumber.  Reads slots in
      OWNED state (live writer) and ORPHANED state (writer has
      exited, ring preserved post-mortem) uniformly.  Returns
      empty for FREE slots, out-of-range procnumbers, or no
      records.

  Privileges: REVOKE'd from PUBLIC and GRANT'ed to
  pg_read_all_stats in system_views.sql, matching the privilege
  model of the session-local view pg_backend_wait_event_trace.

Cross-backend reader protocol implemented in
emit_wait_event_trace_for_procnumber():

  1. Unlocked state probe; FREE -> empty result.
  2. Acquire WaitEventTraceCtl->lock in LW_SHARED.  All slot
     transitions take LW_EXCLUSIVE, so the slot's identity,
     state, and ring_ptr are stable for the duration of the
     iteration.
  3. Resolve ring_ptr via dsa_get_address, read write_pos.
  4. pg_read_barrier() to pair with the writer's
     pg_write_barrier between its pg_atomic_write_u64(&write_pos)
     and the rec->seq update (added in the predecessor commit).
     Without this acquire half of the pair, weak-memory readers
     could observe the new write_pos while still seeing the
     previous ring cycle's even seq value and emit stale-but-
     valid records with wrong ring indices.
  5. Iterate every live ring index, applying the per-record
     seqlock protocol AGAINST SHARED MEMORY: read seq_before,
     barrier, structure-copy the record into a local result
     buffer, barrier, read seq_after.  Skip records with odd
     seq_before (writer mid-record at start of read) or with
     seq_before != seq_after (writer completed a write cycle
     during our read; payload may be torn).
  6. Release the lock.
  7. Walk the local result buffer and emit rows into the
     tuplestore -- the only potentially I/O-expensive part,
     deliberately done outside the lock so spills do not
     extend lock-hold.

Why per-record seqlock against shared memory (not against a
memcpy'd local copy of the ring): the protocol requires
seq_before and seq_after to read THE SAME memory location at TWO
DIFFERENT TIMES, with the payload read in between.  A single
up-front memcpy followed by two reads of seq from the local copy
sees the value frozen at copy time, the check degenerates to a
no-op, and torn reads from concurrent writers cannot be detected.
For an own-session reader (pg_get_backend_wait_event_trace) the
concurrent-writer case does not arise; the cross-backend reader
must do the protocol correctly to handle live OWNED slots safely.

Lock-hold is O(records_in_range) in the iterator (similar wall
time to a single 4 MB memcpy of the full ring, ~1 ms on modern
hardware) with no I/O; the tuplestore work runs after release.

Rename of the procnumber-exposing column on the existing SRFs:
  pg_stat_get_wait_event_timing and pg_stat_get_wait_event_timing_
  overflow used to expose backend_id (procNumber + 1, 1-based
  legacy convention).  This patch renames the column to procnumber
  and makes it 0-based so the value is directly usable as the
  argument to pg_get_wait_event_trace().  The 0-based form matches
  modern PG conventions.

Test coverage:

  * src/test/regress/sql/wait_event_timing.sql
    - Smoke tests for pg_get_wait_event_trace: own session via
      procnumber-keyed path, out-of-range procnumbers (empty),
      capture=off case (empty).
    - Permission gating: a fresh non-pg_read_all_stats role
      attempting to call the function gets ERROR (caught via
      a DO block with an explicit insufficient_privilege
      handler).

  * src/test/modules/test_wait_event_stress/t/001_orphan_roundtrip.pl
    Three scenarios:
    (1) Plain-backend orphan roundtrip.  Writer enables trace,
        emits waits, captures its procnumber, disconnects.
        A long-lived reader (pinning its own slot against
        procnumber recycle) reads the ORPHANED ring and
        asserts the events are present;
        pg_stat_clear_orphaned_wait_event_rings releases the
        orphan; subsequent read returns empty.
    (2) Parallel-worker orphan roundtrip (the patch's stated
        motivation).  A forced-parallel query
        (debug_parallel_query=on plus zero parallel costs)
        runs and exits; leader and at least one worker each
        leave an ORPHANED slot, verified via the admin sweep
        count.
    (3) OWNED-slot read with a concurrent live writer.  A
        long-lived writer emits a steady stream of PgSleep
        events in a 1000-iteration loop while the reader
        calls pg_get_wait_event_trace(writer_procnumber)
        repeatedly.  Every observed row must be well-formed
        (no NULL/empty wait_event_type or wait_event, no
        negative duration); a single malformed row would
        indicate the seqlock failed to detect a torn write.
        This is the scenario the per-record seqlock protocol
        was written to handle correctly.

Documentation: monitoring.sgml entry for pg_get_wait_event_trace
under monitoring-stats-funcs, including the SQL one-liner for
pid-keyed lookup against live backends, plus a clarifying update
to the pg_get_backend_wait_event_trace entry pointing readers at
the cross-backend SRF.

Catversion bump for the new pg_proc OID and the procnumber column
rename.

Tested on Rocky Linux 9.7 + gcc 11.5 with --enable-cassert and
--enable-tap-tests in both --enable-wait-event-timing and stub
build configurations: src/test/regress (249 tests) passes in both,
test_wait_event_stress regress and TAP (10 subtests) pass in the
timing build.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DmitryNFomin added a commit that referenced this pull request May 12, 2026
…reader

review_6.md raised the canonical -hackers objection against the
orphan-persistence lifecycle landed in commit 0dba096: the patch
preserves data (trace rings whose owning backend has exited) that
no in-tree code can actually read.  Without an in-tree consumer the
new orphan-memory cost is paid for hypothetical external workers,
the parallel-worker observability claim is unverifiable, and
review_6.md issue #8 (test coverage of the orphan path) is
structurally impossible because nothing in tree can exercise it.

This commit ships the in-tree consumer.

  pg_get_wait_event_trace(procnumber int4)
      Cross-backend reader keyed by procnumber.  Reads slots in
      OWNED state (live writer) and ORPHANED state (writer has
      exited, ring preserved post-mortem) uniformly.  Returns
      empty for FREE slots, out-of-range procnumbers, or no
      records.

  Privileges: REVOKE'd from PUBLIC and GRANT'ed to
  pg_read_all_stats in system_views.sql, matching the privilege
  model of the session-local view pg_backend_wait_event_trace.

Cross-backend reader protocol implemented in
emit_wait_event_trace_for_procnumber():

  1. Unlocked state probe; FREE -> empty result.
  2. Acquire WaitEventTraceCtl->lock in LW_SHARED.  All slot
     transitions take LW_EXCLUSIVE, so the slot's identity,
     state, and ring_ptr are stable for the duration of the
     iteration.
  3. Resolve ring_ptr via dsa_get_address, read write_pos.
  4. pg_read_barrier() to pair with the writer's
     pg_write_barrier between its pg_atomic_write_u64(&write_pos)
     and the rec->seq update (added in the predecessor commit).
     Without this acquire half of the pair, weak-memory readers
     could observe the new write_pos while still seeing the
     previous ring cycle's even seq value and emit stale-but-
     valid records with wrong ring indices.
  5. Iterate every live ring index, applying the per-record
     seqlock protocol AGAINST SHARED MEMORY: read seq_before,
     barrier, structure-copy the record into a local result
     buffer, barrier, read seq_after.  Skip records with odd
     seq_before (writer mid-record at start of read) or with
     seq_before != seq_after (writer completed a write cycle
     during our read; payload may be torn).
  6. Release the lock.
  7. Walk the local result buffer and emit rows into the
     tuplestore -- the only potentially I/O-expensive part,
     deliberately done outside the lock so spills do not
     extend lock-hold.

Why per-record seqlock against shared memory (not against a
memcpy'd local copy of the ring): the protocol requires
seq_before and seq_after to read THE SAME memory location at TWO
DIFFERENT TIMES, with the payload read in between.  A single
up-front memcpy followed by two reads of seq from the local copy
sees the value frozen at copy time, the check degenerates to a
no-op, and torn reads from concurrent writers cannot be detected.
For an own-session reader (pg_get_backend_wait_event_trace) the
concurrent-writer case does not arise; the cross-backend reader
must do the protocol correctly to handle live OWNED slots safely.

Lock-hold is O(records_in_range) in the iterator (similar wall
time to a single 4 MB memcpy of the full ring, ~1 ms on modern
hardware) with no I/O; the tuplestore work runs after release.

Rename of the procnumber-exposing column on the existing SRFs:
  pg_stat_get_wait_event_timing and pg_stat_get_wait_event_timing_
  overflow used to expose backend_id (procNumber + 1, 1-based
  legacy convention).  This patch renames the column to procnumber
  and makes it 0-based so the value is directly usable as the
  argument to pg_get_wait_event_trace().  The 0-based form matches
  modern PG conventions.

Test coverage:

  * src/test/regress/sql/wait_event_timing.sql
    - Smoke tests for pg_get_wait_event_trace: own session via
      procnumber-keyed path, out-of-range procnumbers (empty),
      capture=off case (empty).
    - Permission gating: a fresh non-pg_read_all_stats role
      attempting to call the function gets ERROR (caught via
      a DO block with an explicit insufficient_privilege
      handler).

  * src/test/modules/test_wait_event_stress/t/001_orphan_roundtrip.pl
    Three scenarios:
    (1) Plain-backend orphan roundtrip.  Writer enables trace,
        emits waits, captures its procnumber, disconnects.
        A long-lived reader (pinning its own slot against
        procnumber recycle) reads the ORPHANED ring and
        asserts the events are present;
        pg_stat_clear_orphaned_wait_event_rings releases the
        orphan; subsequent read returns empty.
    (2) Parallel-worker orphan roundtrip (the patch's stated
        motivation).  A forced-parallel query
        (debug_parallel_query=on plus zero parallel costs)
        runs and exits; leader and at least one worker each
        leave an ORPHANED slot, verified via the admin sweep
        count.
    (3) OWNED-slot read with a concurrent live writer.  A
        long-lived writer emits a steady stream of PgSleep
        events in a 1000-iteration loop while the reader
        calls pg_get_wait_event_trace(writer_procnumber)
        repeatedly.  Every observed row must be well-formed
        (no NULL/empty wait_event_type or wait_event, no
        negative duration); a single malformed row would
        indicate the seqlock failed to detect a torn write.
        This is the scenario the per-record seqlock protocol
        was written to handle correctly.

Documentation: monitoring.sgml entry for pg_get_wait_event_trace
under monitoring-stats-funcs, including the SQL one-liner for
pid-keyed lookup against live backends, plus a clarifying update
to the pg_get_backend_wait_event_trace entry pointing readers at
the cross-backend SRF.

Catversion bump for the new pg_proc OID and the procnumber column
rename.

Tested on Rocky Linux 9.7 + gcc 11.5 with --enable-cassert and
--enable-tap-tests in both --enable-wait-event-timing and stub
build configurations: src/test/regress (249 tests) passes in both,
test_wait_event_stress regress and TAP (10 subtests) pass in the
timing build.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DmitryNFomin added a commit that referenced this pull request May 13, 2026
The lazy-attach hot-path entry points pgstat_wait_event_timing_lazy_
attach() and wait_event_trace_attach() both allocate memory via
dsa_attach / dsa_create / dsa_allocate_extended, which ultimately
reach MemoryContextAlloc.  MemoryContextAlloc Assert-fails on
"CritSectionCount == 0 || allowInCritSection", so any backend
whose very first wait event after wait_event_capture is enabled
lands inside a critical section is killed by the assertion in
cassert builds (or risks heap corruption in production builds).

Reproducer: a parallel worker that has not yet emitted any wait
events scans a heap page, hits BufferSetHintBits16 ->
XLogSaveBufferForHint -> XLogInsert (which opens a critical
section) -> LWLockAcquire, and the wait-end-timing hook routes to
the lazy attach inside the critical section.  Reliably triggered
by the new "parallel-worker orphan roundtrip" TAP scenario added
for review_6.md issue #1.

Fix: check CritSectionCount at the top of each lazy-attach
function and return early if non-zero.  This silently drops the
in-flight wait event (it is neither timed nor traced) but keeps
the backend alive; the next wait event outside any critical
section retries the attach and succeeds, after which the hot
path no longer routes through the lazy-attach function.

Critical sections are brief by design (a few dozen instructions
typically), so the dropped wait events are themselves a few-
nanosecond window of unobservable activity -- well within the
quantisation noise of any wait-event-based profiler.  This
matches the behaviour PG already has for other "skipped under
critical section" optimisations.

Both timing and trace lazy attaches are fixed symmetrically; the
two functions share a near-identical hot path entry shape and
both need the same guard.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DmitryNFomin added a commit that referenced this pull request May 13, 2026
…reader

review_6.md raised the canonical -hackers objection against the
orphan-persistence lifecycle landed in commit 0dba096: the patch
preserves data (trace rings whose owning backend has exited) that
no in-tree code can actually read.  Without an in-tree consumer the
new orphan-memory cost is paid for hypothetical external workers,
the parallel-worker observability claim is unverifiable, and
review_6.md issue #8 (test coverage of the orphan path) is
structurally impossible because nothing in tree can exercise it.

This commit ships the in-tree consumer.

  pg_get_wait_event_trace(procnumber int4)
      Cross-backend reader keyed by procnumber.  Reads slots in
      OWNED state (live writer) and ORPHANED state (writer has
      exited, ring preserved post-mortem) uniformly.  Returns
      empty for FREE slots, out-of-range procnumbers, or no
      records.

  Privileges: REVOKE'd from PUBLIC and GRANT'ed to
  pg_read_all_stats in system_views.sql, matching the privilege
  model of the session-local view pg_backend_wait_event_trace.

Cross-backend reader protocol implemented in
emit_wait_event_trace_for_procnumber():

  1. Unlocked state probe; FREE -> empty result.
  2. Acquire WaitEventTraceCtl->lock in LW_SHARED.  All slot
     transitions take LW_EXCLUSIVE, so the slot's identity,
     state, and ring_ptr are stable for the duration of the
     iteration.
  3. Resolve ring_ptr via dsa_get_address, read write_pos.
  4. pg_read_barrier() to pair with the writer's
     pg_write_barrier() between its pg_atomic_write_u64(&write_pos)
     and the rec->seq update (added in the predecessor commit).
  5. Iterate every live ring index, applying the per-record
     POSITION-ENCODED IDENTITY SEQLOCK against shared memory:

         expected_seq = (uint32)(i * 2 + 2);
         seq_before = rec_shared->seq;
         pg_read_barrier();
         if (seq_before != expected_seq) continue;
         local_copy = *rec_shared;
         pg_read_barrier();
         seq_after = rec_shared->seq;
         if (seq_after != expected_seq) continue;

     The writer encodes ring position into the seq value: position
     pos uses (pos*2+1) for mid-write and (pos*2+2) for complete.
     After RING_SIZE writes the slot wraps and the same memory
     location gets a new numerically-distinct seq.  A parity-only
     seqlock check (skip odd, accept stable even) is INSUFFICIENT
     for this layout in the cross-backend case: if the writer
     advanced write_pos but the rec->seq update has not yet
     propagated to this CPU's cache, the reader sees the previous
     cycle's stable even seq, both seq_before and seq_after match,
     and a record belonging to the previous cycle gets emitted
     with the new ring_index -- silent attribution corruption.
     The identity-against-expected check rejects all four failure
     modes (stale prior cycle, mid-write current cycle, wrapped-
     past-us next cycle, torn write completed mid-read).

  6. Release the lock.
  7. Walk the local result buffer and emit rows into the
     tuplestore.  Tuplestore work (potential disk spill) happens
     outside the lock so it does not extend lock-hold.

Why per-record seqlock against shared memory (not a memcpy'd
local copy): the protocol requires seq_before and seq_after to
read THE SAME memory location at TWO DIFFERENT TIMES.  A single
up-front memcpy followed by two reads of seq from the local copy
sees the value frozen at copy time; the check degenerates to a
no-op.

Lock-hold is O(records_in_range) shared-memory loads (similar
wall time to a single 4 MB memcpy of the full ring, ~1 ms on
modern hardware) with no I/O.

Rename of the procnumber-exposing column on the existing SRFs:
  pg_stat_get_wait_event_timing and pg_stat_get_wait_event_timing_
  overflow used to expose backend_id (procNumber + 1, 1-based
  legacy convention).  This patch renames the column to procnumber
  and makes it 0-based so the value is directly usable as the
  argument to pg_get_wait_event_trace().  The 0-based form matches
  modern PG conventions.

Test coverage:

  * src/test/regress/sql/wait_event_timing.sql
    - Smoke tests: own session via procnumber-keyed path,
      out-of-range procnumbers (empty), capture=off case (empty).
    - Permission gating: non-pg_read_all_stats role gets ERROR
      (caught via DO block with explicit handler).

  * src/test/modules/test_wait_event_stress/t/001_orphan_roundtrip.pl
    Three scenarios:
    (1) Plain-backend orphan roundtrip.  Writer enables trace,
        emits waits, captures procnumber, disconnects.  Reader
        reads ORPHANED ring; admin sweep releases; subsequent
        read empty.
    (2) Parallel-worker orphan roundtrip (the patch's stated
        motivation).  Forced-parallel query with 1M rows and
        ORDER BY (routes tuples through shm_mq queues, ensuring
        workers emit MessageQueueSend/MessageQueueReceive waits
        and lazily allocate their trace rings before exit).
        Asserts pg_stat_clear_orphaned_wait_event_rings returns
        >= 2 (leader + at least one worker).  Includes a drain-
        wait loop to avoid racing with worker before_shmem_exit
        callbacks.
    (3) OWNED-slot read with a concurrent live writer.  Writer
        emits ~1000 PgSleep waits in a tight loop; reader calls
        pg_get_wait_event_trace(writer_procnumber) repeatedly.
        Every observed row must be well-formed (no NULL/empty
        wait_event_type, wait_event, no negative duration).

WaitEventTraceControl protocol documentation in wait_event_timing.h
is rewritten to describe the position-encoded identity seqlock
(not the previous parity-only memcpy-then-check pattern, which
was structurally broken for cross-process readers) so external
authors building ASH/AWR background workers have correct guidance.

Documentation: monitoring.sgml entry for pg_get_wait_event_trace
under monitoring-stats-funcs, including the SQL one-liner for
pid-keyed lookup against live backends, plus a clarifying update
to pg_get_backend_wait_event_trace entry.

Catversion bump for the new pg_proc OID and the procnumber column
rename.

Tested on Rocky Linux 9.7 + gcc 11.5 with --enable-cassert and
--enable-tap-tests in both --enable-wait-event-timing and stub
build configurations: src/test/regress (249 tests) passes in both,
test_wait_event_stress regress and TAP (11 subtests) pass in the
timing build (10 stress reruns).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DmitryNFomin added a commit that referenced this pull request May 13, 2026
The lazy-attach hot-path entry points pgstat_wait_event_timing_lazy_
attach() and wait_event_trace_attach() both allocate memory via
dsa_attach / dsa_create / dsa_allocate_extended, which ultimately
reach MemoryContextAlloc.  MemoryContextAlloc Assert-fails on
"CritSectionCount == 0 || allowInCritSection", so any backend
whose very first wait event after wait_event_capture is enabled
lands inside a critical section is killed by the assertion in
cassert builds (or risks heap corruption in production builds).

Reproducer: a parallel worker that has not yet emitted any wait
events scans a heap page, hits BufferSetHintBits16 ->
XLogSaveBufferForHint -> XLogInsert (which opens a critical
section) -> LWLockAcquire, and the wait-end-timing hook routes
to the lazy attach inside the critical section.  Reliably
triggered by the parallel-worker scenario added in the
follow-up commit for review_6.md issue #1.

Fix: check CritSectionCount at the top of each lazy-attach
function and return early if non-zero.  This silently drops the
in-flight wait event (it is neither timed nor traced) but keeps
the backend alive; the next wait event outside any critical
section retries the attach and succeeds, after which the hot
path no longer routes through the lazy-attach function.

Critical sections are brief by design, so the dropped wait
events are themselves a few-nanosecond window of unobservable
activity -- well within the quantisation noise of any wait-
event-based profiler.

Both timing and trace lazy attaches are fixed symmetrically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DmitryNFomin added a commit that referenced this pull request May 13, 2026
…reader

review_6.md raised the canonical -hackers objection against the
orphan-persistence lifecycle landed in commit 0dba096: the patch
preserves data (trace rings whose owning backend has exited) that
no in-tree code can actually read.  Without an in-tree consumer the
new orphan-memory cost is paid for hypothetical external workers,
the parallel-worker observability claim is unverifiable, and
review_6.md issue #8 (test coverage of the orphan path) is
structurally impossible because nothing in tree can exercise it.

This commit ships the in-tree consumer.

  pg_get_wait_event_trace(procnumber int4)
      Cross-backend reader keyed by procnumber.  Reads slots in
      OWNED state (live writer) and ORPHANED state (writer has
      exited, ring preserved post-mortem) uniformly.  Returns
      empty for FREE slots, out-of-range procnumbers, or no
      records.

  Privileges: REVOKE'd from PUBLIC and GRANT'ed to
  pg_read_all_stats in system_views.sql, matching the privilege
  model of the session-local view pg_backend_wait_event_trace.

Cross-backend reader protocol implemented in
emit_wait_event_trace_for_procnumber():

  1. Unlocked state probe; FREE -> empty result.
  2. Allocate worst-case result buffer BEFORE taking the lock so
     the palloc -- which can bottom out in a glibc mmap syscall
     for the ~5 MB worst-case size -- runs without holding the
     WaitEventTraceCtl lock.
  3. Acquire WaitEventTraceCtl->lock in LW_SHARED.  All slot
     transitions take LW_EXCLUSIVE, so the slot's identity,
     state, and ring_ptr are stable for the duration of the
     iteration.
  4. Resolve ring_ptr via dsa_get_address, read write_pos.
  5. Iterate every live ring index, applying the per-record
     POSITION-ENCODED IDENTITY SEQLOCK against shared memory:

         expected_seq = (uint32)(i * 2 + 2);
         seq_before = rec_shared->seq;
         pg_read_barrier();
         if (seq_before != expected_seq) continue;
         local_copy = *rec_shared;
         pg_read_barrier();
         seq_after = rec_shared->seq;
         if (seq_after != expected_seq) continue;

     The writer encodes ring position into the seq value:
     position pos uses (pos*2+1) for mid-write and (pos*2+2) for
     complete.  After RING_SIZE writes the slot wraps and gets a
     new numerically-distinct seq.  A parity-only seqlock check
     (skip odd, accept stable even) is INSUFFICIENT for this
     layout in the cross-backend case: if the writer advanced
     write_pos but the rec->seq update has not yet propagated to
     this CPU's cache, the reader sees the previous cycle's
     stable even seq, both seq_before and seq_after match, and
     a record belonging to the previous cycle gets emitted with
     the new ring_index -- silent attribution corruption.  The
     identity-against-expected check rejects all four failure
     modes (stale prior cycle, mid-write current cycle, wrapped-
     past-us next cycle, torn write completed mid-read).

  6. Release the lock.
  7. Walk the local result buffer and emit rows into the
     tuplestore.  Tuplestore work (potential disk spill) happens
     outside the lock so it does not extend lock-hold.

Why per-record seqlock against shared memory (not a memcpy'd
local copy): the protocol requires seq_before and seq_after to
read THE SAME memory location at TWO DIFFERENT TIMES.  A single
up-front memcpy followed by two reads of seq from the local copy
sees the value frozen at copy time; the check degenerates to a
no-op.

Lock-hold is O(records_in_range) shared-memory loads (similar
wall time to a single 4 MB memcpy of the full ring, ~1 ms on
modern hardware) with no I/O and no syscalls.

Rename of the procnumber-exposing column on the existing SRFs:
  pg_stat_get_wait_event_timing and pg_stat_get_wait_event_timing_
  overflow used to expose backend_id (procNumber + 1, 1-based
  legacy convention).  This patch renames the column to procnumber
  and makes it 0-based so the value is directly usable as the
  argument to pg_get_wait_event_trace().  The 0-based form matches
  modern PG conventions.

A writer-side INJECTION_POINT("wait-event-trace-after-write-pos")
is added between the pg_atomic_write_u64(&write_pos) and the
rec->seq update; it is compiled out unless --enable-injection-
points is set, and is intended as a hook for a future regression
test of the position-encoded identity seqlock.  Wiring a TAP
test through this is fiddly enough to defer to a follow-up; the
identity check is correct by construction and the protocol is
fully documented on WaitEventTraceControl.

Test coverage:

  * src/test/regress/sql/wait_event_timing.sql
    - Smoke tests: own session via procnumber-keyed path,
      out-of-range procnumbers (empty), capture=off case (empty).
    - Permission gating: non-pg_read_all_stats role gets ERROR
      (caught via DO block with explicit handler).

  * src/test/modules/test_wait_event_stress/t/001_orphan_roundtrip.pl
    Three scenarios:
    (1) Plain-backend orphan roundtrip.  Writer enables trace,
        emits waits, captures procnumber, disconnects.  Reader
        reads ORPHANED ring; admin sweep releases; subsequent
        read empty.
    (2) Parallel-worker orphan roundtrip (the patch's stated
        motivation).  Forced-parallel query with 1M rows and
        ORDER BY (routes tuples through shm_mq queues, ensuring
        workers emit MessageQueueSend/MessageQueueReceive waits
        and lazily allocate their trace rings before exit).
        Asserts pg_stat_clear_orphaned_wait_event_rings returns
        >= 2 (leader + at least one worker).  Includes a drain-
        wait loop to avoid racing with worker before_shmem_exit
        callbacks.
    (3) OWNED-slot read with a concurrent live writer.  Writer
        emits ~1000 PgSleep waits in a tight loop; reader calls
        pg_get_wait_event_trace(writer_procnumber) repeatedly.
        Every observed row must be well-formed.

WaitEventTraceControl protocol documentation in wait_event_timing.h
is rewritten to describe the position-encoded identity seqlock
(not the previous parity-only memcpy-then-check pattern, which
was structurally broken for cross-process readers) so external
authors building ASH/AWR background workers have correct guidance.

Documentation: monitoring.sgml entry for pg_get_wait_event_trace
under monitoring-stats-funcs, including the SQL one-liner for
pid-keyed lookup against live backends, plus a clarifying update
to pg_get_backend_wait_event_trace entry.

Catversion bump for the new pg_proc OID and the procnumber column
rename.

Tested on Rocky Linux 9.7 + gcc 11.5 with --enable-cassert and
--enable-tap-tests in both --enable-wait-event-timing and stub
build configurations: src/test/regress (249 tests) passes in both,
test_wait_event_stress regress and TAP (11 subtests) pass in the
timing build with 5/5 stress reruns.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DmitryNFomin added a commit that referenced this pull request May 13, 2026
…reader

review_6.md raised the canonical -hackers objection against the
orphan-persistence lifecycle landed in commit 0dba096: the patch
preserves data (trace rings whose owning backend has exited) that
no in-tree code can actually read.  Without an in-tree consumer the
new orphan-memory cost is paid for hypothetical external workers,
the parallel-worker observability claim is unverifiable, and
review_6.md issue #8 (test coverage of the orphan path) is
structurally impossible because nothing in tree can exercise it.

This commit ships the in-tree consumer.

  pg_get_wait_event_trace(procnumber int4)
      Cross-backend reader keyed by procnumber.  Reads slots in
      OWNED state (live writer) and ORPHANED state (writer has
      exited, ring preserved post-mortem) uniformly.  Returns
      empty for FREE slots, out-of-range procnumbers, or no
      records.

  Privileges: REVOKE'd from PUBLIC and GRANT'ed to
  pg_read_all_stats in system_views.sql, matching the privilege
  model of the session-local view pg_backend_wait_event_trace.

Cross-backend reader protocol implemented in
emit_wait_event_trace_for_procnumber():

  1. Unlocked state probe; FREE -> empty result.
  2. Allocate worst-case result buffer BEFORE taking the lock so
     the palloc -- which can bottom out in a glibc mmap syscall
     for the ~5 MB worst-case size -- runs without holding the
     WaitEventTraceCtl lock.
  3. Acquire WaitEventTraceCtl->lock in LW_SHARED.  All slot
     transitions take LW_EXCLUSIVE, so the slot's identity,
     state, and ring_ptr are stable for the duration of the
     iteration.
  4. Resolve ring_ptr via dsa_get_address, read write_pos.
  5. Iterate every live ring index, applying the per-record
     POSITION-ENCODED IDENTITY SEQLOCK against shared memory:

         expected_seq = (uint32)(i * 2 + 2);
         seq_before = rec_shared->seq;
         pg_read_barrier();
         if (seq_before != expected_seq) continue;
         local_copy = *rec_shared;
         pg_read_barrier();
         seq_after = rec_shared->seq;
         if (seq_after != expected_seq) continue;

     The writer encodes ring position into the seq value:
     position pos uses (pos*2+1) for mid-write and (pos*2+2) for
     complete.  After RING_SIZE writes the slot wraps and gets a
     new numerically-distinct seq.  A parity-only seqlock check
     (skip odd, accept stable even) is INSUFFICIENT for this
     layout in the cross-backend case: if the writer advanced
     write_pos but the rec->seq update has not yet propagated to
     this CPU's cache, the reader sees the previous cycle's
     stable even seq, both seq_before and seq_after match, and
     a record belonging to the previous cycle gets emitted with
     the new ring_index -- silent attribution corruption.  The
     identity-against-expected check rejects all four failure
     modes (stale prior cycle, mid-write current cycle, wrapped-
     past-us next cycle, torn write completed mid-read).

  6. Release the lock.
  7. Walk the local result buffer and emit rows into the
     tuplestore.  Tuplestore work (potential disk spill) happens
     outside the lock so it does not extend lock-hold.

Why per-record seqlock against shared memory (not a memcpy'd
local copy): the protocol requires seq_before and seq_after to
read THE SAME memory location at TWO DIFFERENT TIMES.  A single
up-front memcpy followed by two reads of seq from the local copy
sees the value frozen at copy time; the check degenerates to a
no-op.

Lock-hold is O(records_in_range) shared-memory loads (similar
wall time to a single 4 MB memcpy of the full ring, ~1 ms on
modern hardware) with no I/O and no syscalls.

Rename of the procnumber-exposing column on the existing SRFs:
  pg_stat_get_wait_event_timing and pg_stat_get_wait_event_timing_
  overflow used to expose backend_id (procNumber + 1, 1-based
  legacy convention).  This patch renames the column to procnumber
  and makes it 0-based so the value is directly usable as the
  argument to pg_get_wait_event_trace().  The 0-based form matches
  modern PG conventions.

A writer-side INJECTION_POINT("wait-event-trace-after-write-pos")
is added between the pg_atomic_write_u64(&write_pos) and the
rec->seq update; it is compiled out unless --enable-injection-
points is set, and is intended as a hook for a future regression
test of the position-encoded identity seqlock.  Wiring a TAP
test through this is fiddly enough to defer to a follow-up; the
identity check is correct by construction and the protocol is
fully documented on WaitEventTraceControl.

Test coverage:

  * src/test/regress/sql/wait_event_timing.sql
    - Smoke tests: own session via procnumber-keyed path,
      out-of-range procnumbers (empty), capture=off case (empty).
    - Permission gating: non-pg_read_all_stats role gets ERROR
      (caught via DO block with explicit handler).

  * src/test/modules/test_wait_event_stress/t/001_orphan_roundtrip.pl
    Three scenarios:
    (1) Plain-backend orphan roundtrip.  Writer enables trace,
        emits waits, captures procnumber, disconnects.  Reader
        reads ORPHANED ring; admin sweep releases; subsequent
        read empty.
    (2) Parallel-worker orphan roundtrip (the patch's stated
        motivation).  Forced-parallel query with 1M rows and
        ORDER BY (routes tuples through shm_mq queues, ensuring
        workers emit MessageQueueSend/MessageQueueReceive waits
        and lazily allocate their trace rings before exit).
        Asserts pg_stat_clear_orphaned_wait_event_rings returns
        >= 2 (leader + at least one worker).  Includes a drain-
        wait loop to avoid racing with worker before_shmem_exit
        callbacks.
    (3) OWNED-slot read with a concurrent live writer.  Writer
        emits ~1000 PgSleep waits in a tight loop; reader calls
        pg_get_wait_event_trace(writer_procnumber) repeatedly.
        Every observed row must be well-formed.

WaitEventTraceControl protocol documentation in wait_event_timing.h
is rewritten to describe the position-encoded identity seqlock
(not the previous parity-only memcpy-then-check pattern, which
was structurally broken for cross-process readers) so external
authors building ASH/AWR background workers have correct guidance.

Documentation: monitoring.sgml entry for pg_get_wait_event_trace
under monitoring-stats-funcs, including the SQL one-liner for
pid-keyed lookup against live backends, plus a clarifying update
to pg_get_backend_wait_event_trace entry.

Catversion bump for the new pg_proc OID and the procnumber column
rename.

Tested on Rocky Linux 9.7 + gcc 11.5 with --enable-cassert and
--enable-tap-tests in both --enable-wait-event-timing and stub
build configurations: src/test/regress (249 tests) passes in both,
test_wait_event_stress regress and TAP (11 subtests) pass in the
timing build with 5/5 stress reruns.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@DmitryNFomin DmitryNFomin force-pushed the wait-event-timing branch 4 times, most recently from febf636 to a6016e7 Compare May 13, 2026 15:49
DmitryNFomin added a commit that referenced this pull request May 13, 2026
…raph

The monitoring.sgml entry for pg_backend_wait_event_trace
contained a slot-lifecycle paragraph claiming that orphan-
persistence is "what makes the view above usable for parallel-
worker observability."  The reviewer correctly pointed out that
the view is session-local: pg_backend_wait_event_trace always
reads my_wait_event_trace (the calling backend's own ring) and
never touches an orphan in another procnumber's slot.  Orphan-
persistence has zero effect on the view's behaviour.

The actual beneficiaries of orphan-persistence are the cross-
backend interfaces:

  * pg_get_wait_event_trace(procnumber) -- the in-tree SRF
    added for review_6 issue #1, which reads OWNED and
    ORPHANED slots uniformly.
  * External background workers (ASH/AWR/10046-style ring
    readers) following the cross-backend snapshot pattern
    documented on WaitEventTraceControl.

Rewrite the paragraph to:

  * Drop the misleading "view above" attribution.
  * Add an explicit "This does not change the behaviour of this
    view, which always reads the calling backend's own ring and
    is unaffected by orphan-state slots belonging to other
    procnumbers." sentence, so a reader following the link from
    a parallel-worker observability question doesn't form the
    wrong mental model.
  * Reference pg_get_wait_event_trace by name and link to the
    monitoring-stats-funcs section.
  * Reattribute the parallel-worker-exit-fast rationale
    correctly: it explains *why* orphan-persistence matters in
    the first place (the data would otherwise be lost), not why
    this view is suddenly different.

No code changes.  Doc build (check-tabs, check-nbsp) and main
regress (249 tests) still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@DmitryNFomin DmitryNFomin force-pushed the wait-event-timing branch 4 times, most recently from 68bdde2 to d1553b9 Compare May 14, 2026 12:01
DmitryNFomin added a commit that referenced this pull request May 14, 2026
…s/comments

review_6.md issue #9 flagged that documentation comments
over-reference "future ASH/AWR/10046-style background workers"
as justification for the cross-backend trace surface.  On
-hackers this reads as speculative -- naming hypothetical
consumers to motivate present code.  PG style prefers factual
statements about what the code does, with neutral framing for
consumers ("any extension that reads via the snapshot pattern").

The framing is also now doubly stale: since review_6 was
written we landed pg_get_wait_event_trace (review_6.md issue
#1) as the in-tree cross-backend consumer.  The "future" tense
is wrong as well as speculative.

Sweep through wait_event_timing.h, wait_event_timing.c, and
monitoring.sgml.  In each speculative reference replace the
"future ASH/AWR/10046-style background workers" phrasing with
neutral framing that:

  * names the in-tree consumer (pg_get_wait_event_trace) where
    that is what callers should use today,
  * refers to "extensions following the snapshot pattern
    documented on WaitEventTraceControl" for the
    direct-shared-memory case,
  * drops the speculative consumer enumeration entirely.

References to "10046-style trace" as a descriptive label for
the technique (e.g. "Per-session wait event trace ring buffer
(10046-style)" on WaitEventTraceState) are retained -- that is
a recognised term for the per-session-event-trace
profiling style, used here as a descriptor of the design, not
as a forward-looking promise about a particular consumer.

Files touched:

  * doc/src/sgml/monitoring.sgml: pg_backend_wait_event_trace
    cross-backend-tools paragraph, pg_get_wait_event_trace
    description, and pg_stat_clear_orphaned_wait_event_rings
    description.
  * src/include/utils/wait_event_timing.h: slot-lifecycle
    comment block (3 spots).
  * src/backend/utils/activity/wait_event_timing.c:
    wait_event_trace_before_shmem_exit doc, the
    pg_backend_wait_event_trace SRF doc, the
    pg_get_wait_event_trace SRF doc, and the
    pg_stat_clear_orphaned_wait_event_rings doc.

No code change.  Pure documentation rewrite.

Builds clean in both --enable-wait-event-timing and stub
configurations.  Main regress (249 tests) and
test_wait_event_stress TAP (15 subtests) pass on the timing
build; main regress (249 tests) passes on the stub build.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DmitryNFomin added a commit that referenced this pull request May 14, 2026
…s/comments

review_6.md issue #9 flagged that documentation comments
over-reference "future ASH/AWR/10046-style background workers"
as justification for the cross-backend trace surface.  On
-hackers this reads as speculative -- naming hypothetical
consumers to motivate present code.  PG style prefers factual
statements about what the code does, with neutral framing for
consumers ("any extension that reads via the snapshot pattern").

The framing is also now doubly stale: since review_6 was
written we landed pg_get_wait_event_trace (review_6.md issue
#1) as the in-tree cross-backend consumer.  The "future" tense
is wrong as well as speculative.

Sweep through wait_event_timing.h, wait_event_timing.c, and
monitoring.sgml.  In each speculative reference replace the
"future ASH/AWR/10046-style background workers" phrasing with
neutral framing that:

  * names the in-tree consumer (pg_get_wait_event_trace) where
    that is what callers should use today,
  * refers to "extensions following the snapshot pattern
    documented on WaitEventTraceControl" for the
    direct-shared-memory case,
  * drops the speculative consumer enumeration entirely.

References to "10046-style trace" as a descriptive label for
the technique (e.g. "Per-session wait event trace ring buffer
(10046-style)" on WaitEventTraceState) are retained -- that is
a recognised term for the per-session-event-trace
profiling style, used here as a descriptor of the design, not
as a forward-looking promise about a particular consumer.

Files touched:

  * doc/src/sgml/monitoring.sgml: pg_backend_wait_event_trace
    cross-backend-tools paragraph, pg_get_wait_event_trace
    description, and pg_stat_clear_orphaned_wait_event_rings
    description.
  * src/include/utils/wait_event_timing.h: slot-lifecycle
    comment block (3 spots).
  * src/backend/utils/activity/wait_event_timing.c:
    wait_event_trace_before_shmem_exit doc, the
    pg_backend_wait_event_trace SRF doc, the
    pg_get_wait_event_trace SRF doc, and the
    pg_stat_clear_orphaned_wait_event_rings doc.

No code change.  Pure documentation rewrite.

Builds clean in both --enable-wait-event-timing and stub
configurations.  Main regress (249 tests) and
test_wait_event_stress TAP (15 subtests) pass on the timing
build; main regress (249 tests) passes on the stub build.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@DmitryNFomin DmitryNFomin force-pushed the wait-event-timing branch 5 times, most recently from 902df44 to adc517e Compare May 15, 2026 07:10
@DmitryNFomin
Copy link
Copy Markdown
Owner Author

DmitryNFomin commented May 15, 2026

Full regression battery for 8b77f0f

All tests below were run against 8b77f0f739 on a dedicated benchmark machine (CCX43 — 16 dedicated AMD EPYC Milan vCPU / 64 GB RAM). Includes a new eviction-stress suite designed to maximise the wait-event transition rate.

1. pgbench -S (read-only, 16c/8j/30s, 4 GB shared_buffers) — 5 runs each

run BASELINE WET/off WET/stats WET/trace
1 163,656 165,540 162,655 166,045
2 162,859 167,232 164,518 165,465
3 165,870 167,471 163,137 166,766
4 165,444 167,534 165,016 167,480
5 166,466 166,425 165,303 162,679
mean TPS 164,859 166,840 164,126 165,687
vs BASELINE +1.20 % −0.44 % +0.50 %

All three modes are within run-to-run noise of vanilla — the compile-time penalty is gone.

2. pgbench TPC-B (16c/8j/30s) — 5 runs each

run BASELINE WET/off WET/stats WET/trace
1 17,611 17,414 17,297 17,251
2 17,520 17,576 17,212 17,234
3 17,554 17,596 17,125 17,428
4 17,485 17,416 17,262 17,304
5 17,673 17,434 17,303 17,351
mean TPS 17,569 17,487 17,240 17,314
vs BASELINE −0.46 % −1.87 % −1.45 %

3. evict_stress — new high-wait-event-transition-rate suite

shared_buffers = 32 MB versus a ~1.5 GB working set forces ~98 % buffer miss; every miss generates a BufferIO / DataFileRead wait event. Combined with high concurrency this drives the wait-event hot path at maximum rate.

test scenario mode TPS Δ vs off alive
W1 32 MB SB + pgbench -S 16c off 116,487
stats 116,625 +0.1 %
trace 114,844 −1.4 %
W2 32 MB SB + TPC-B 16c off 16,813
stats 16,766 −0.3 %
trace 16,698 −0.7 %
W3 32 MB SB + pgbench -S 32c oversubscribed off 162,845
stats 158,880 −2.4 %
trace 159,141 −2.3 %
W4 32 MB SB + hot-row UPDATE 32c off 8,138
stats 7,772 −4.5 %
trace 7,814 −4.0 %

Sample of recorded wait events under W1/trace verifies the timing path actually captures the expected work:

 wait_event_type |    wait_event
-----------------+-------------------
 IO              | DataFileRead       (dominant — buffer-miss reads)
 Activity        | BgwriterMain
 Activity        | WalWriterMain
 IO              | WalWrite
 LWLock          | WaitEventTraceDSA
 IO              | AioIoCompletion

No crashes in any of the 12 sub-runs across the four eviction-stress scenarios. Recording overhead under maximum-pressure scenarios peaks at ~4.5 % in W4 (where every transaction contends on the same row → many lock-wait transitions per tx). This is the real cost of writing per-event statistics, bounded and proportional to the wait-event rate.

4. Stress regression (stress2.sh, T1–T6)

# scenario result
T1 FOR UPDATE 16c × 5s × {off, stats, trace} alive ✓
T2 pgbench -C connect-storm × {off, trace} alive ✓
T3 pg_ctl stop -m immediate recovery alive ✓
T4 pg_terminate_backend storm alive ✓
T5 rapid mode flips during load alive ✓
T6 UPDATE contention × {off, stats, trace} alive ✓

No PANIC, no segfault.

5. Additional stress (extra_stress.sh, E1–E7)

# scenario result
E1 50 short-lived backends + pg_stat_clear_orphaned_wait_event_rings() swept 50/50 ✓
E2 5 concurrent sessions × 20 forced-parallel aggregations alive ✓
E3 150 concurrent pg_get_wait_event_trace() reads vs writer alive ✓
E4 tiny ring (8 KB = 256 records) under pgbench -S alive ✓
E5 huge ring (32 MB = 1M records) under pgbench -S alive ✓
E6 VACUUM ANALYZE × 5 concurrent with TPC-B alive ✓
E7 mode-flip storm — flip every ~1.2 s for 30 s during load alive ✓

6. Streaming replication (repl_test.sh)

Primary + hot-standby both with wait_event_capture = trace.

metric value
pgbench on primary 12,257 TPS
standby count(*) after replay 1,000,000 rows ✓
walreceiver / startup wait events captured yes (WalReceiverMain, RecoveryConflictSnapshot)
crash in either log none

7. check-world (full upstream regression suite)

All tests successful.
Result: PASS

Both make check and TAP-suite tests passed end-to-end (set -e would have aborted on any failure).

8. ASAN — rebuilt with new source

inst-asan rebuilt from 8b77f0f739 with -fsanitize=address; ldd confirms libasan.so.8 linked.

phase scenario
A pgbench -S 8c × 20s × {off, stats, trace}
B pgbench TPC-B 8c × 20s × {stats, trace}
C pgbench -C -S connect-storm 8c × 15s × trace
D SELECT FOR UPDATE 16c × 15s × trace
E pg_terminate_backend storm

Postmaster log: no ASAN issues detected (no heap-use-after-free, no buffer-overflow, no UB).

Client-side pgbench leak warnings are pre-existing pg_strdup / lookupCreateVariable leaks in upstream pgbench code, unrelated to this patch.

DmitryNFomin and others added 4 commits May 15, 2026 15:42
PostgreSQL already exposes the *current* wait event of each
backend in pg_stat_activity, but there is no in-core way to
attribute waits across time: no per-backend counts, no
duration histograms, no event sequence.  Production diagnosis
of "what was this backend waiting on?" requires sampling
pg_stat_activity at high frequency, which both undercounts
short waits and burdens the cluster.

This patch adds opt-in instrumentation that records wait
events for the calling backend in two modes:

  * stats -- count, total/avg/max duration, and 32-bucket
    log2-spaced duration histogram, per (backend, wait-event)
    pair.  Memory cost: ~26 KB/backend (DSA-backed, lazily
    allocated when capture is first enabled cluster-wide).

  * trace -- everything in stats, plus a per-session ring
    buffer of individual events plus query-attribution
    markers (ExecStart/ExecEnd, QueryStart/QueryEnd).  This is
    the per-session-event-trace style ("10046-style") used by
    monitoring extensions and on-host workload analyzers.

The instrumentation is gated by a new compile-time flag
--enable-wait-event-timing.  Production builds without the
flag pay zero cost: the pgstat_report_wait_start/end inline
path's only addition is an `#ifdef USE_WAIT_EVENT_TIMING`
block that is compiled out.  Builds with the flag compiled in
but the wait_event_capture GUC at the default off value pay
one well-predicted branch per wait event (the OFF check
folds to a const compare against zero, branch-predicted
straight-line).

New runtime knobs:

  * wait_event_capture (PGC_SUSET, enum off | stats | trace,
    default off).  Per-session GUC; superusers grant
    SET ON PARAMETER to extend.  Stepping down trace -> off
    immediately releases the ring; stats counters survive
    capture toggles until an explicit reset.

  * wait_event_timing_max_tranches (PGC_POSTMASTER, int,
    default 192).  Cap on distinct LWLock tranches each
    backend tracks individually.  Excess tranches counted
    against lwlock_overflow_count.

  * wait_event_trace_ring_size_kb (PGC_POSTMASTER, int,
    default 4096 KB, power-of-two, range 8..32768).  Per-
    backend trace ring buffer size.  Tuneable per
    deployment: lower at high max_connections to reduce
    worst-case DSA footprint; raise for longer retention
    before the FIFO wrap.

New SQL surface (all REVOKE'd from PUBLIC, GRANT'd to
pg_read_all_stats):

  * pg_stat_wait_event_timing (view), backed by
    pg_stat_get_wait_event_timing(pid int4) -- per-backend
    per-event counts/durations/histograms.

  * pg_stat_wait_event_timing_overflow (view), backed by
    pg_stat_get_wait_event_timing_overflow(pid int4) --
    counts of waits dropped to overflow (LWLock-hash full,
    event-class flat array full) per backend.

  * pg_backend_wait_event_trace (view), backed by
    pg_get_backend_wait_event_trace() -- this session's
    own trace ring, materialised.

  * pg_get_wait_event_trace(procnumber int4) (SRF) -- the
    canonical cross-backend reader.  Reads OWNED and
    ORPHANED slots uniformly using a position-encoded
    identity seqlock against shared memory.

  * pg_stat_reset_wait_event_timing(pid int4) and
    pg_stat_reset_wait_event_timing_all() -- reset counters
    for one backend or cluster-wide.  Uses a generation-
    counter retry protocol that coalesces concurrent resets
    without losing in-flight wait observations.

  * pg_stat_clear_orphaned_wait_event_rings() (superuser) --
    explicit DBA sweep that releases orphaned trace rings
    from backends that exited while
    wait_event_capture = trace.

Trace-ring lifecycle and orphan semantics:

  At backend exit the per-session trace ring is NOT freed.
  The slot transitions OWNED -> ORPHANED and the ring stays
  allocated in DSA so cross-backend readers (the in-tree
  pg_get_wait_event_trace and any extension that follows
  the snapshot pattern documented on WaitEventTraceControl)
  can read the dying backend's final waits.  This is the
  one observable scenario that motivated the lifecycle work:
  parallel workers exit in milliseconds at end-of-parallel-
  query, well below any monitoring tool's polling interval,
  and without orphan-persistence their final waits are lost.

  Orphans are reclaimed lazily in three places:
  (1) clear_orphan_at_init runs at every new backend's
      InitProcess and frees the prior orphan whose
      procNumber this backend inherits; this drains
      orphans naturally as connection churn cycles slots.
  (2) The session that disabled tracing releases its own
      ring synchronously from the GUC assign hook.
  (3) The DBA-driven sweep pg_stat_clear_orphaned_wait_
      event_rings handles the pathological case of capture
      briefly enabled then disabled on a cluster with
      long-lived pooled connections that never recycle
      procNumbers.

Worst-case orphan footprint is bounded by
NUM_WAIT_EVENT_TIMING_SLOTS * wait_event_trace_ring_size_kb
(default 4 MB; ~400 MB at max_connections=100, ~4 GB at
1000).

Hot-path design:

  pgstat_report_wait_start/end remain inline.  The new gate
  if (unlikely(wait_event_capture != WAIT_EVENT_CAPTURE_OFF))
  steers the predictor to the no-op fall-through.  Inside,
  the per-backend slot pointer is lazy-attached with a one-
  shot in_attach re-entrancy guard so a nested wait inside
  the DSA-attach internal LWLock cannot recurse.

  The trace-ring writer is lock-free for the owning backend
  (single-writer claim via plain read+write of write_pos,
  avoiding the LOCK XADD that pg_atomic_fetch_add_u64 would
  emit) and uses a per-record position-encoded identity
  seqlock (rec->seq = pos*2+1 mid-write, pos*2+2 complete)
  to coordinate with cross-backend readers.  The identity
  encoding rejects all four failure modes -- stale prior
  cycle, mid-write current cycle, wrapped-past-us next
  cycle, and torn write completed mid-read -- which a
  parity-only seqlock would miss on weak-memory archs.

  The writer hot path also coordinates with the GUC assign
  hook's release_slot via a per-backend
  wait_event_trace_writes_disabled gate so a dsa_free in
  the release path can't UAF the ring through a nested
  wait event.

LWLock-timing storage:

  Each backend's per-tranche histograms live in a small
  open-addressing hash table sized at server start by
  wait_event_timing_max_tranches.  Hits are O(1) amortised;
  the probe count per lookup is bounded so a pathological
  fill never spins.  Tranches encountered after the table
  fills are counted against lwlock_overflow_count and not
  individually timed.

Stub build (without --enable-wait-event-timing):

  Backing variables for all four GUCs are defined
  unconditionally so the GUC framework has somewhere to
  store the value.  The check hook for wait_event_capture
  rejects non-off values (warning + force to off at server
  start, hard error at SET) so misconfigurations are
  diagnosed loudly.  SQL stubs of all SRFs return empty
  results; the reset/clear functions ereport
  ERRCODE_FEATURE_NOT_SUPPORTED.  Marker entry points
  (wait_event_trace_query_start etc.) are no-op stubs so
  call sites in execMain.c, backend_status.c, and
  postgres.c do not need #ifdef guards.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
monitoring.sgml gets three new view entries
(pg_stat_wait_event_timing, pg_stat_wait_event_timing_overflow,
pg_backend_wait_event_trace) and new function entries under
monitoring-stats-funcs (pg_get_wait_event_trace,
pg_stat_reset_wait_event_timing, pg_stat_reset_wait_event_
timing_all, pg_stat_clear_orphaned_wait_event_rings), plus a
section on the cross-backend snapshot protocol that
monitoring extensions can use to read another backend's
trace ring without going through the session-local SQL view.

config.sgml gets the three new GUC entries
(wait_event_capture, wait_event_timing_max_tranches,
wait_event_trace_ring_size_kb) in the cumulative statistics
section, including memory-footprint guidance for sizing
wait_event_trace_ring_size_kb against max_connections.

installation.sgml documents the --enable-wait-event-timing
configure flag, including the hot-path zero-cost guarantee
in builds that compile the feature in but leave
wait_event_capture = off.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Regression tests
----------------

src/test/regress/sql/wait_event_timing.sql exercises the GUC,
the timing/overflow/trace SQL surface, and the histogram
taxonomy view.  Two expected outputs are provided:

  * wait_event_timing.out -- for --enable-wait-event-timing
    builds, asserts that a SET wait_event_capture = stats
    followed by pg_sleep(0.1) lands a PgSleep entry with
    non-zero counts, durations, and a valid 32-bucket
    histogram whose entries sum to the visible call count
    (call-count invariant: count == sum(histogram)).

  * wait_event_timing_1.out -- for stub builds, asserts the
    SET fails with the expected feature-not-supported error
    and the SQL functions return empty results.

The parallel_schedule places wait_event_timing in its own
group; the test cannot share state with other tests, and the
two-phase reset_count invariant breaks if a foreign session
SETs the GUC between the test's two stat-snapshot queries.

Stress / TAP coverage
---------------------

src/test/modules/test_wait_event_stress wraps a small C
helper (stress_wait_events) that exercises the LWLock-timing
hash and the wait-event flat array, with deterministic counts.
The module's SQL regression script checks the
write_pos == sum(per_backend_calls) invariant under load.

Two TAP tests cover the lifecycle paths that the regression
harness cannot exercise (asynchronous backend exits, parallel
workers, ring wraparound):

  * t/001_orphan_roundtrip.pl -- four scenarios for the
    trace-ring orphan lifecycle:
      1. Plain-backend orphan roundtrip (writer exits while
         capture is trace, reader observes the ORPHANED ring,
         admin sweep clears it).
      2. Parallel-worker orphan roundtrip (forced-parallel
         query with shm_mq tuple flow, asserts the admin
         sweep reclaims leader + worker orphans).
      3. OWNED-slot read with a concurrent live writer
         (verifies cross-backend reads against an actively-
         written ring return only well-formed rows).
      4. clear_orphan_at_init reclaim (writer disconnects,
         many new backends spawn until one inherits the
         writer's procNumber, verifies the inheritor's
         InitProcess freed the orphan).  Bounded by the
         ProcGlobal->freeProcs FIFO depth.

  * t/002_ring_wrap.pl -- provisions a cluster with the
    smallest legal wait_event_trace_ring_size_kb (8 KB =
    256 records), drives a writer through enough waits to
    wrap many times, and asserts the session-local and
    cross-backend readers both return at most ring-size
    well-formed records with post-wrap seq values.  Verifies
    the writer's `pos & ring_mask` indexing and the per-
    record identity seqlock survive the wrap regime.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the configure flag to a single Linux task in
.cirrus.tasks.yml so the timing/trace code paths get
buildfarm-equivalent coverage on every push.  The other CI
tasks continue to build without the flag, so the stub-build
path (and its alternate-output regression test) remains
exercised at every push as well.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant