Skip to content

Add PostgreSQL committer activity analysis and visualization#29

Draft
NikolayS wants to merge 2724 commits into
masterfrom
claude/verify-git-history-PaBip
Draft

Add PostgreSQL committer activity analysis and visualization#29
NikolayS wants to merge 2724 commits into
masterfrom
claude/verify-git-history-PaBip

Conversation

@NikolayS
Copy link
Copy Markdown
Owner

Summary

This PR adds a new analysis tool to visualize commit activity trends for PostgreSQL's top-10 committers over time, along with GitHub Actions workflows for Claude Code integration.

Key Changes

  • analysis/generate_chart.py: New Python script that:

    • Parses git commit history data (year and author) from a data file
    • Identifies the top-10 most prolific committers
    • Generates two visualization charts:
      • Line chart showing individual committer activity trends over time
      • Stacked area chart showing cumulative contribution trends
    • Exports data to CSV format for further analysis
    • Uses matplotlib with dark theme styling and highlights Bruce Momjian with enhanced visibility
  • analysis/top10_activity.csv: Pre-generated dataset containing commit counts by year for the top-10 committers from 1996-2026

  • analysis/top10_activity.png and analysis/top10_activity_stacked.png: Generated visualization outputs

  • analysis/.gitignore: Excludes the large git log input file (~1.2 MB) from version control

  • .github/workflows/claude.yml and .github/workflows/claude-code-review.yml: GitHub Actions workflows for Claude Code integration (comment-triggered and PR-triggered code assistance)

Notable Implementation Details

  • The script uses defaultdict for efficient year-by-author aggregation
  • Visualization includes total commit counts in legend labels for context
  • Bruce Momjian is visually emphasized with thicker lines and full opacity
  • The stacked chart legend is reversed for better readability with the stacking order
  • Includes clear documentation on how to regenerate input data using git log

https://claude.ai/code/session_01SAWhyKXpDkFpwPiCP3EZ7T

anarazel and others added 30 commits April 5, 2026 00:43
In a subsequent commit the read-ahead distance will only be increased when
waiting for IO. Without further work that would cause a regression: As IO
combining and read-ahead are currently controlled by the same mechanism, we
would end up not allowing IO combining when never needing to wait for IO (as
the distance ends up too small to allow for full sized IOs), which can
increase CPU overhead. A typical reason to not have to wait for IO completion
at a low look-ahead distance is use of io_uring with the to-be-read data in
the page cache. But even with worker the IO submission rate may be low enough
for the worker to keep up.

One might think that we could just always perform IO combining, but doing so
at the start of a scan can cause performance regressions:

1) Performing a large IO commonly has a higher latency than smaller IOs. That
   is not a problem once reading ahead far enough, but at the start of a stream
   it can lead to longer waits for IO completion.

2) Sometimes read streams will not be read to completion. Immediately starting
   with full sized IOs leads to more wasted effort. This is not commonly an
   issue with existing read stream users, but the upcoming use of read streams
   to fetch table pages as part of an index scan frequently encounters this.

Solve this issue by splitting ReadStream->distance into ->combine_distance and
->readahead_distance. Right now they are increased/decreased at the same time,
but that will change in the next commit.

One of the comments in read_stream_should_look_ahead() refers to a motivation
that only really exists as of the next commit, but without it the code doesn't
make sense on its own.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/f3xxfrkafjxpyqxywcxricxgyizjirfceychyxsgn7bwjp5eda@kwbduhy7tfmu
Discussion: https://postgr.es/m/CA+hUKGL2PhFyDoqrHefqasOnaXhSg48t1phs3VM8BAdrZqKZkw@mail.gmail.com
This avoids increasing the distance to the maximum in cases where the I/O
subsystem is already keeping up. This turns out to be important for
performance for two reasons:

- Pinning a lot of buffers is not cheap. If additional pins allow us to avoid
  IO waits, it's definitely worth it, but if we can already do all the
  necessary readahead at a distance of 16, reading ahead 512 buffers can
  increase the CPU overhead substantially.  This is particularly noticeable
  when the to-be-read blocks are already in the kernel page cache.

- If the read stream is read to completion, reading in data earlier than
  needed is of limited consequences, leaving aside the CPU costs mentioned
  above. But if the read stream will not be fully consumed, e.g. because it is
  on the inner side of a nested loop join, the additional IO can be a serious
  performance issue. This is not that commonly a problem for current read
  stream users, but the upcoming work, to use a read stream to fetch table
  pages as part of an index scan, frequently encounters this.

Note that this commit would have substantial performance downsides without
earlier commits:

- Commit 6e36930, which avoids decreasing the readahead distance when
  there was recent IO, is crucial, as otherwise we very often would end up not
  reading ahead aggressively enough anymore with this commit, due to
  increasing the distance less often.

- "read stream: Split decision about look ahead for AIO and combining" is
  important as we would otherwise not perform IO combining when the IO
  subsystem can keep up.

- "aio: io_uring: Trigger async processing for large IOs" is important to
  continue to benefit from memory copy parallelism when using fewer IOs.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Tested-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/f3xxfrkafjxpyqxywcxricxgyizjirfceychyxsgn7bwjp5eda@kwbduhy7tfmu
Discussion: https://postgr.es/m/CA+hUKGL2PhFyDoqrHefqasOnaXhSg48t1phs3VM8BAdrZqKZkw@mail.gmail.com
Merge pgaio_worker_submit_internal() and pgaio_worker_submit().  The
separation didn't serve any purpose.

Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKG%2Bm4xV0LMoH2c%3DoRAdEXuCnh%2BtGBTWa7uFeFMGgTLAw%2BQ%40mail.gmail.com
READ ONLY transactions should prevent modifications to foreign data as
well as local data, but postgres_fdw transactions declared as READ ONLY
that reference foreign tables mapped to a remote view executing volatile
functions would modify data on remote servers, as it would open remote
transactions in READ WRITE mode.

Similarly, DEFERRABLE transactions should not abort due to a
serialization failure even when accessing foreign data, but postgres_fdw
transactions declared as DEFERRABLE would abort due to that failure in a
remote server, as it would open remote transactions in NOT DEFERRABLE
mode.

To fix, modify postgres_fdw to open remote transactions in the same
access/deferrable modes as the local transaction.  This commit also
modifies it to open remote subtransactions in the same access mode as
the local subtransaction.

This commit changes the behavior of READ ONLY/DEFERRABLE transactions
using postgres_fdw; in particular, it doesn't allow the READ ONLY
transactions to modify data on remote servers anymore, so such
transactions should be redeclared as READ WRITE or rewritten using other
tools like dblink.  The release notes should note this as an
incompatibility.

These issues exist since the introduction of postgres_fdw, but to avoid
the incompatibility in the back branches, fix them in master only.

Author: Etsuro Fujita <etsuro.fujita@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAPmGK16n_hcUUWuOdmeUS%2Bw4Q6dZvTEDHb%3DOP%3D5JBzo-M3QmpQ%40mail.gmail.com
Discussion: https://postgr.es/m/E1uLe9X-000zsY-2g%40gemulon.postgresql.org
A future REPACK patch wants a way to suppress index_build doing its
progress reports when building an index, because that would interfere
with repack's own reporting; so add an INDEX_CREATE_SUPPRESS_PROGRESS
bit that enables this.

Furthermore, change the index_create_copy() API so that it takes flag
bits for index_create() and passes them unchanged.  This gives its
callers more direct control, which eases the interface -- now its
callers can pass the INDEX_CREATE_SUPPRESS_PROGRESS bit directly.  We
use it for the current caller in REINDEX CONCURRENTLY, since it's also
not interested in progress reporting, since it doesn't want
index_build() to be called at all in the first place.

One thing to keep in mind, pointed out by Mihail, is that we're not
suppressing the index-AM-specific progress report updates which happen
during ambuild().  At present this is not a problem, because the values
updated by those don't overlap with those used by commands other than
CREATE INDEX; but maybe in the future we'll want the ability to suppress
them also.  (Alternatively we might want to display how each
index-build-subcommand progresses during REPACK and others.)

Author: Antonin Houska <ah@cybertec.at>
Author: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Discussion: https://postgr.es/m/102906.1773668762@localhost
Add parse_ddl_options(), append_ddl_option(), and append_guc_value()
helper functions in a new ddlutils.c file that provide common option
parsing and output formatting for the pg_get_*_ddl family of functions
which will follow in later patches.  These accept VARIADIC text
arguments as alternating name/value pairs.

Callers declare an array of DdlOption descriptors specifying the
accepted option names and their types (boolean, text, or integer).
parse_ddl_options() matches each supplied pair against the array,
validates the value, and fills in the result fields.  This
descriptor-based scheme is based on an idea from Euler Taveira.

This is placed in a new ddlutils.c file which will contain the
pg_get_*_ddl functions.

Author: Akshay Joshi <akshay.joshi@enterprisedb.com>
Co-authored-by: Andrew Dunstan <andrew@dunslane.net>
Co-authored-by: Euler Taveira <euler@eulerto.com>
Discussion: https://postgr.es/m/CAKWEB6rmnmGKUA87Zmq-s=b3Scsnj02C0kObQjnbL2ajfPWGEw@mail.gmail.com
Discussion: https://postgr.es/m/4c5f895e-3281-48f8-b943-9228b7da6471@gmail.com
Discussion: https://postgr.es/m/CANxoLDc6FHBYJvcgOnZyS+jF0NUo3Lq_83-rttBuJgs9id_UDg@mail.gmail.com
Discussion: https://postgr.es/m/e247c261-e3fb-4810-81e0-a65893170e94@dunslane.net
Add a new SQL-callable function that returns the DDL statements needed
to recreate a role. It takes a regrole argument and an optional VARIADIC
text argument for options that are specified as alternating name/value
pairs. The following options are supported: pretty (boolean) for
formatted output and memberships (boolean) to include GRANT statements
for role memberships and membership options. The return is one or
multiple rows where the first row is a CREATE ROLE statement and
subsequent rows are ALTER ROLE statements to set some role properties.
Password information is never included in the output.

The caller must have SELECT privilege on pg_authid.

Author: Mario Gonzalez <gonzalemario@gmail.com>
Author: Bryan Green <dbryan.green@gmail.com>
Co-authored-by: Andrew Dunstan <andrew@dunslane.net>
Co-authored-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Quan Zongliang <quanzongliang@yeah.net>
Reviewed-by: jian he <jian.universality@gmail.com>
Discussion: https://postgr.es/m/4c5f895e-3281-48f8-b943-9228b7da6471@gmail.com
Discussion: https://postgr.es/m/e247c261-e3fb-4810-81e0-a65893170e94@dunslane.net
Add a new SQL-callable function that returns the DDL statements needed
to recreate a tablespace. It takes a tablespace name or OID and an
optional VARIADIC text argument for options that are specified as
alternating name/value pairs. The following options are supported: pretty
(boolean) for formatted output and owner (boolean) to include OWNER.
(It includes two variants because there is no regtablespace pseudotype.)
The return is one or multiple rows where the first row is a CREATE
TABLESPACE statement and subsequent rows are ALTER TABLESPACE statements
to set some tablespace properties.

The caller must have SELECT privilege on pg_tablespace.

get_reloptions() in ruleutils.c is made non-static so it can be called
from the new ddlutils.c file.

Author: Nishant Sharma <nishant.sharma@enterprisedb.com>
Author: Manni Wood <manni.wood@enterprisedb.com>
Co-authored-by: Andrew Dunstan <andrew@dunslane.net>
Co-authored-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Jim Jones <jim.jones@uni-muenster.de>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAKWEB6rmnmGKUA87Zmq-s=b3Scsnj02C0kObQjnbL2ajfPWGEw@mail.gmail.com
Discussion: https://postgr.es/m/e247c261-e3fb-4810-81e0-a65893170e94@dunslane.net
Add a new SQL-callable function that returns the DDL statements needed
to recreate a database. It takes a regdatabase argument and an optional
VARIADIC text argument for options that are specified as alternating
name/value pairs. The following options are supported: pretty (boolean)
for formatted output, owner (boolean) to include OWNER and tablespace
(boolean) to include TABLESPACE. The return is one or multiple rows
where the first row is a CREATE DATABASE statement and subsequent rows are
ALTER DATABASE statements to set some database properties.

The caller must have CONNECT privilege on the target database.

Author: Akshay Joshi <akshay.joshi@enterprisedb.com>
Co-authored-by: Andrew Dunstan <andrew@dunslane.net>
Co-authored-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Quan Zongliang <quanzongliang@yeah.net>
Discussion: https://postgr.es/m/CANxoLDc6FHBYJvcgOnZyS+jF0NUo3Lq_83-rttBuJgs9id_UDg@mail.gmail.com
Discussion: https://postgr.es/m/e247c261-e3fb-4810-81e0-a65893170e94@dunslane.net
While working on refactoring how shmem is allocated, I made a mistake
where the main LWLock array did not reserve space for the LWLocks
allocated with RequestNamedLWLockTranche(), and the test still
passed. Matthias van de Meent spotted that before it got committed,
but in order to catch such mistakes in the future, add checks in
test_lwlock_tranches that the locks allocated with
RequestNamedLWLockTranche() can be acquired and released.

Another change is to stop requesting multiple tranches with the same
name with RequestNamedLWLockTranche(). As soon as I started to test
using the locks I realized that's bogus, and the next commit will
forbid it. Keep test coverage for duplicates requested with
LWLockNewTrancheId() for now, but make it more clear that that's what
the test does.

Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://www.postgresql.org/message-id/463a28db-0c0b-4af6-bac6-3891828bbbfe@iki.fi
Discussion: https://www.postgresql.org/message-id/CAEze2WjgCROMMXY0+j8FFdm3iFcr7By-+6Mwiz=PgGSEydiW3A@mail.gmail.com
You could request two tranches with same name, but things would get
confusing when you called GetNamedLWLockTranche() to get the LWLocks
allocated for them; it would always return the first tranche with the
name. That doesn't make sense, so forbid duplicates.

We still allow duplicates with LWLockNewTrancheId(). That works better
as you don't use the name to look up the tranche ID later. It's still
confusing in wait events, for example, but it's not dangerous in the
same way.

Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://www.postgresql.org/message-id/463a28db-0c0b-4af6-bac6-3891828bbbfe@iki.fi
The two new functions allow to extract the block number and offset from a tid.

There are existing ways to do so (e.g. by doing (ctid::text::point)[0]), but
they are hard to remember and not pretty.

tid_block() returns int8 (bigint) because BlockNumber is uint32, which exceeds
the range of int4. tid_offset() returns int4 (integer) because OffsetNumber is
uint16, which fits safely in int4.

Bumps catversion.

Author: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/CAJTYsWUzok2+mvSYkbVUwq_SWWg-GdHqCuYumN82AU97SjwjCA@mail.gmail.com
The database name was warned about when building with
-DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS, leading to BF and CI failures.

It is somewhat confusing that the required prefix is different for databases
than other object types.

Also fix a pgindent violation that caused koel to start to fail.

Discussion: https://postgr.es/m/ptyiexyhmtxf4lm524s7o7w64r26ra237uusv4tjav4yhpmeoo@vfwwllz7tivb
Introduce TriggerInstrumentation to capture trigger timing and firings
(previously counted in "ntuples"), to aid a future refactoring that
splits out all Instrumentation fields beyond timing and WAL/buffers into
more specific structs.

In passing, drop the "n" argument to InstrAlloc, as all remaining callers need
exactly one Instrumentation struct.  The duplication between InstrAlloc() and
InstrInit(), as well as the conditional initialization of async_mode will be
addressed in a subsequent commit.

Author: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/CAP53PkzdBK8VJ1fS4AZ481LgMN8f9mJiC39ZRHqkFUSYq6KWmg@mail.gmail.com
Previously, different places (e.g. query "total time") were repurposing the
Instrumentation struct initially introduced for capturing per-node statistics
during execution. This overuse of the same struct is confusing, e.g. by
cluttering calls of InstrStartNode/InstrStopNode in unrelated code paths, and
prevents future refactorings.

Instead, simplify the Instrumentation struct to only track time and WAL/buffer
usage. Similarly, drop the use of InstrEndLoop outside of per-node
instrumentation - these calls were added without any apparent benefit since
the relevant fields were never read.

Introduce the NodeInstrumentation struct to carry forward the per-node
instrumentation information. WorkerInstrumentation is renamed to
WorkerNodeInstrumentation for clarity.

In passing, clarify that InstrAggNode is expected to only run after
InstrEndLoop (as it does in practice), and drop unused code.

This also fixes a consequence-less bug: Previously ->async_mode was only set
when a non-zero instrument_option was passed. That turns out to be harmless
right now, as ->async_mode only affects a timing related field.

Author: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAP53PkzdBK8VJ1fS4AZ481LgMN8f9mJiC39ZRHqkFUSYq6KWmg@mail.gmail.com
A little refactoring in preparation for the next commit, to make the
material changes in that commit more clear.

Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAExHW5vM1bneLYfg0wGeAa=52UiJ3z4vKd3AJ72X8Fw6k3KKrg@mail.gmail.com
This replaces the [Subsystem]ShmemSize() and [Subsystem]ShmemInit()
functions called at postmaster startup with a new set of callbacks.
The new mechanism is designed to be more ergonomic. Notably, the size
of each shmem area is specified in the same ShmemRequestStruct() call,
together with its name. The same mechanism is used in extensions,
replacing the shmem_{request/startup}_hooks.

ShmemInitStruct() and ShmemInitHash() become backwards-compatibility
wrappers around the new functions. In future commits, I will replace
all ShmemInitStruct() and ShmemInitHash() calls with the new
functions, although we'll still need to keep them around for
extensions.

Co-authored-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/CAExHW5vM1bneLYfg0wGeAa=52UiJ3z4vKd3AJ72X8Fw6k3KKrg@mail.gmail.com
The old ShmemInit{Struct/Hash}() functions could be used after
postmaster statup, as long as the allocation is small enough to fit in
spare shmem reserved at startup. I believe some extensions do that,
although we hadn't really documented it and had not coverage for it.
The new test module covers that after-startup usage with the new
ShmemRequestStruct() functions.

Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/CAExHW5vM1bneLYfg0wGeAa=52UiJ3z4vKd3AJ72X8Fw6k3KKrg@mail.gmail.com
As part of this, embed the LWLock it needs in the shared memory struct
itself, so that we don't need to use RequestNamedLWLockTranche()
anymore. LWLockNewTrancheId() + LWLockInitialize() is more convenient
to use in extensions.

Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/CAExHW5vM1bneLYfg0wGeAa=52UiJ3z4vKd3AJ72X8Fw6k3KKrg@mail.gmail.com
To add a new built-in subsystem, add it to subsystemslist.h. That
hooks up its shmem callbacks so that they get called at the right
times during postmaster startup. For now this is unused, but will
replace the current SubsystemShmemSize() and SubsystemShmemInit()
calls in the next commits.

Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/CAExHW5vM1bneLYfg0wGeAa=52UiJ3z4vKd3AJ72X8Fw6k3KKrg@mail.gmail.com
It seems like a good candidate to convert first because it needs to
initialized before any other subsystem, but other than that it's
nothing special.

Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/CAExHW5vM1bneLYfg0wGeAa=52UiJ3z4vKd3AJ72X8Fw6k3KKrg@mail.gmail.com
These subsystems have some complicating properties, making them
slightly harder to convert than most:

- The initialization callbacks of some of these subsystems have
  dependencies, i.e. they need to be initialized in the right order.

- The ProcGlobal pointer still needs to be inherited by the
  BackendParameters mechanism on EXEC_BACKEND builds, because
  ProcGlobal is required by InitProcess() to get a PGPROC entry, and
  the PGPROC entry is required to use LWLocks, and usually attaching
  to shared memory areas requires the use of LWLocks.

- Similarly, ProcSignal pointer still needs to be handled by
  BackendParameters, because query cancellation connections access it
  without calling InitProcess

Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/CAExHW5vM1bneLYfg0wGeAa=52UiJ3z4vKd3AJ72X8Fw6k3KKrg@mail.gmail.com
This is in preparation to convert it to use the new shmem allocation
functions, making the next commit that does that smaller. This inlines
SerialInit() to the caller, and moves all the initialization steps
within PredicateLockShmemInit() to happen after all the
ShmemInit{Struct|Hash}() calls.

Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/CAExHW5vM1bneLYfg0wGeAa=52UiJ3z4vKd3AJ72X8Fw6k3KKrg@mail.gmail.com
I replaced the old SimpleLruInit() function without a backwards
compatibility wrapper, because few extensions define their own SLRUs.

Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/CAExHW5vM1bneLYfg0wGeAa=52UiJ3z4vKd3AJ72X8Fw6k3KKrg@mail.gmail.com
This replaces the "shmem_size" and "shmem_init" callbacks in the IO
methods table with the same ShmemCallback struct that we now use in
other subsystems

Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/CAExHW5vM1bneLYfg0wGeAa=52UiJ3z4vKd3AJ72X8Fw6k3KKrg@mail.gmail.com
The buffer blocks, converted to use ShmemRequestStruct() in the next
commit, are IO-aligned. This might come handy in other places too, so
make it an explicit feature of ShmemRequestStruct().

Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/CAExHW5vM1bneLYfg0wGeAa=52UiJ3z4vKd3AJ72X8Fw6k3KKrg@mail.gmail.com
This rectifies the initialization functions a little, making the
"buffer strategy" stuff in freelist.c and buffer mapping hash table in
buf_init.c top-level "subsystems" of their own, registered directly in
subsystemlist.h. Previously they were called indirectly from
BufferManagerShmemInit() and BufferManagerShmemSize()

Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/CAExHW5vM1bneLYfg0wGeAa=52UiJ3z4vKd3AJ72X8Fw6k3KKrg@mail.gmail.com
This removes all remaining uses of ShmemInitStruct() and
ShmemInitHash() from built-in code.

Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/CAExHW5vM1bneLYfg0wGeAa=52UiJ3z4vKd3AJ72X8Fw6k3KKrg@mail.gmail.com
Commit 7c64d56 has removed the isolation test providing coverage
for lock statistics due to some instability in the CI, where the
deadlock timeout may not have enough time to process, preventing the
stats data to be updated.  These also relied on a set of hardcoded
sleeps.

This commit switches the test suite to TAP, instead, that uses an
injection point with a wait to avoid the sleeps.  The injection point is
added in ProcSleep(), once we know that the deadlock timeout has fired
and that the stats have been updated.

Multiple lock patterns are checked, all rely on the same workflow, with
two sessions:
- session 1 holds a given lock type.
- session 2 attaches to the new injection point with the wait action.
- session 2 attempts to acquire a lock conflicting with the lock of
session 1, waiting for the injection point to be reached.
- session 1 releases its lock, session 2 commits.
- pg_stat_lock is polled until the counters are updated for the lock
type.

Bertrand's version of the patch introduced a new routine to
BackgroundPsql() to detect the blocked background sessions.  I have
tweaked the test so as we use the same method as some of the other tests
instead, based on some \echo commands.  This test has been run multiple
times in the CI, all passing, so I'd like to think that this is more
stable than the first version attempted.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/acNTR1lLHwQJ0o+P@ip-10-97-1-34.eu-west-3.compute.internal
If the background worker for processing databases manages to finish
before the launcher starts waiting for it, the launcher would treat
it erroneously as an error.  Fix by ensureing to check result state
in this case.  Identified on CI and synthetically reproduced during
local testing.

Also while, make sure to properly lock the shared memory structure
before updating tje result state.

Author: Daniel Gustafsson <daniel@yesql.seA
Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/4fxw37ge47v5baeozla5phymi233hxbcjbwwsfwv3mpg3kyl2z@6jk4nkf6jp4
michaelpq and others added 19 commits April 21, 2026 14:46
This batch is similar to 462fe0f and addresses a variety of code
style issues, including grammar mistakes, typos, inconsistent variable
names in function declarations, and incorrect function names in comments
and documentation.  These fixes have accumulated on the community
mailing lists since the commit mentioned above.

Notably, Alexander Lakhin previously submitted a patch identifying many
of the trivial typos and grammar issues that had been reported on
pgsql-hackers.  His patch covered a somewhat large portion of the issues
addressed here, though not all of them.

The documentation changes only affect HEAD.
We were using "select count(*) into x from generate_series(1,
1_000_000_000_000)" to waste one second waiting for a statement
timeout trap.  Aside from consuming CPU to little purpose, this could
easily eat several hundred MB of temporary file space, which has been
observed to cause out-of-disk-space errors in the buildfarm.
Let's just use "pg_sleep(10)", which is far less resource-intensive.

Also update the "when others" exception handler so that if it does
ever again trap an error, it will tell us what error.  The cause of
these intermittent buildfarm failures had been obscure for awhile.

Discussion: https://postgr.es/m/557992.1776779694@sss.pgh.pa.us
Backpatch-through: 14
We installed this in commit eea9fa9 to protect against foreseeable
mistakes that would break ABI in stable branches by renumbering
NodeTag enum entries.  However, we now have much more thorough
ABI stability checks thanks to buildfarm members using libabigail
(see the .abi-compliance-history mechanism).  So this incomplete,
single-purpose check seems like an anachronism.  I wouldn't object
to keeping it were it not that it requires an additional manual step
when making a new stable git branch.  That seems like something easy
to screw up, so let's get rid of it.

This patch just removes the logic that checks for changes in the last
auto-assigned NodeTag value.  We still need eea9fa9's cross-check
on the supplied list of header files, to prevent divergence between
the makefile and meson build systems.  We'll also sometimes need the
nodetag_number() infrastructure for hand-assigning new NodeTags in
stable branches.

Discussion: https://postgr.es/m/1458883.1776143073@sss.pgh.pa.us
GetLocalPinLimit() and GetAdditionalLocalPinLimit(), currently in use
only by the read stream, previously allowed a backend to pin all
num_temp_buffers local buffers. This meant that the read stream could
use every available local buffer for read-ahead, leaving none for other
concurrent pin-holders like other read streams and related buffers like
the visibility map buffer needed during on-access pruning.

This became more noticeable since b46e1e5, which allows on-access
pruning to set the visibility map, which meant that some scans also
needed to pin a page of the VM. It caused a test in
src/test/regress/sql/temp.sql to fail in some cases.

Cap the local pin limit to num_temp_buffers / 4, providing some
headroom. This doesn't guarantee that all needed pins will be available
— for example, a backend can still open more cursors than there are
buffers — but it makes it less likely that read-ahead will exhaust the
pool.

Note that these functions are not limited by definition to use in the
read stream; however, this cap should be appropriate in other contexts.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/97529f5a-ec10-46b1-ab50-4653126c6889%40gmail.com
Since b46e1e5 allowed setting the VM on-access and 378a216 set
pd_prune_xid on INSERT, the testing of generic/custom plans in
src/test/regress/sql/plancache.sql was destabilized.

One of the queries of test_mode could have set the pages all-visible and
if autovacuum/autoanalyze ran and updated pg_class.relallvisible, it
would affect whether we got an index-only or sequential scan.

Preclude this by disabling autovacuum and autoanalyze for test_mode and
carefully sequencing when ANALYZE is run.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/71277259-264e-4983-a201-938b404049d7%40gmail.com
The btree_gist enum test expects a bitmap heap scan. Since b46e1e5
enabled setting the VM during on-access pruning and 378a216 set
pd_prune_xid on INSERT, scans of enumtmp may set pages all-visible.
If autovacuum or autoanalyze then updates pg_class.relallvisible, the
planner could choose an index-only scan instead.

Make the enumtmp a temp table to exclude it from autovacuum/autoanalyze.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/46733d68-aec0-4d09-8120-4c66b87047a4%40gmail.com
FlushUnlockedBuffer() accepted io_object and io_context arguments but
hardcoded IOOBJECT_RELATION and IOCONTEXT_NORMAL when calling
FlushBuffer(). Pass them through instead. Also fix FlushBuffer() to use
its io_object parameter for I/O timing stats rather than hardcoding
IOOBJECT_RELATION.

Not actively broken since all current callers pass IOOBJECT_RELATION and
IOCONTEXT_NORMAL, so not backpatched.

Author: Chao Li <lic@highgo.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/BC97546F-5C15-42F2-AD57-CFACDB9657D0@gmail.com
This neglected to set TAP_TESTS = 1, and partially compensated
for that by writing duplicative hand-made rules for check and
installcheck.  That's not really sufficient though.  The way
I noticed the error was that "make distclean" didn't clean out
the tmp_check subdirectory, and there might be other consequences.
Do it the standard way instead.
This commit tweaks ALTER INDEX .. ATTACH PARTITION to attempt a
validation of a parent index in the case where an index is already
attached but the parent is not yet valid.  This occurs in cases where a
parent index was created invalid such as with CREATE INDEX ONLY, but was
left invalid after an invalid child index was attached (partitioned
indexes set indisvalid to false if at least one partition is
!indisvalid, indisvalid is true in a partitioned table iff all
partitions are indisvalid).  This could leave a partition tree in a
situation where a user could not bring the parent index back to valid
after fixing the child index, as there is no built-in mechanism to do
so.  This commit relies on the fact that repeated ATTACH PARTITION
commands on the same index silently succeed.

An invalid parent index is more than just a passive issue.  It causes
for example ON CONFLICT on a partitioned table if the invalid parent
index is used to enforce a unique constraint.

Some test cases are added to track some of problematic patterns, using a
set of partition trees with combinations of invalid indexes and ATTACH
PARTITION.

Reported-by: Mohamed Ali <moali.pg@gmail.com>
Author: Sami Imseih <sanmimseih@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Haibo Yan <tristan.yim@gmail.com>
Discussion: http://postgr.es/m/CAGnOmWqi1D9ycBgUeOGf6mOCd2Dcf=6sKhbf4sHLs5xAcKVCMQ@mail.gmail.com
Backpatch-through: 14
The ri_FetchConstraintInfo() and ri_LoadConstraintInfo() functions
were declared to return const RI_ConstraintInfo *, but callers
sometimes need to modify the struct, requiring casts to drop the
const.  Remove the misapplied const qualifiers and the casts that
worked around them.

Reported-by: Peter Eisentraut <peter@eisentraut.org>
Author: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/548600ed-8bbb-4e50-8fc3-65091b122276@eisentraut.org
If the SET or WHERE clause of an INSERT ... ON CONFLICT command
references EXCLUDED.col, where col is a virtual generated column, the
column was not properly expanded, leading to an "unexpected virtual
generated column reference" error, or incorrect results.

The problem was that expand_virtual_generated_columns() would expand
virtual generated columns in both the SET and WHERE clauses and in the
targetlist of the EXCLUDED pseudo-relation (exclRelTlist). Then
fix_join_expr() from set_plan_refs() would turn the expanded
expressions in the SET and WHERE clauses back into Vars, because they
would be found to match the expression entries in the indexed tlist
produced from exclRelTlist.

To fix this, arrange for expand_virtual_generated_columns() to not
expand virtual generated columns in exclRelTlist. This forces
set_plan_refs() to resolve generation expressions in the query using
non-virtual columns, as required by the executor.

In addition, exclRelTlist now always contains only Vars. That was
something already claimed in a couple of existing comments in the
planner, which relied on that fact to skip some processing, though
those did not appear to constitute active bugs.

Reported-by: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDf7wTLz_vqb1wi1EJ_4Uh+Vxm75+b4c-Ky=6P+yOAHjbQ@mail.gmail.com
Backpatch-through: 18
Formerly, attempting to use WHERE CURRENT OF to update or delete from
a table with virtual generated columns would fail with the error
"WHERE CURRENT OF on a view is not implemented".

The reason was that the check preventing WHERE CURRENT OF from being
used on a view was in replace_rte_variables_mutator(), which presumed
that the only way it could get there was as part of rewriting a query
on a view. That is no longer the case, since replace_rte_variables()
is now also used to expand the virtual generated columns of a table.

Fix by doing the check for WHERE CURRENT OF on a view at parse time.
This is safe, since it is no longer possible for the relkind to change
after the query is parsed (as of b23cd18).

Reported-by: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDc_TwzSgb=B_QgNLt3mvZdmRK23rLb+RkanSQkDF40GjA@mail.gmail.com
Backpatch-through: 18
When using ALTER TABLE ... MERGE PARTITIONS or ALTER TABLE ... SPLIT
PARTITION, extension dependencies on partition indexes were being lost.
This happened because the new partition indexes are created fresh from
the parent partitioned table's indexes, while the old partition indexes
(with their extension dependencies) are dropped.

Fix this by collecting extension dependencies from source partition
indexes before detaching them, then applying those dependencies to the
corresponding new partition indexes after they're created.  The mapping
between old and new indexes is done via their common parent partitioned
index.

For MERGE operations, all source partition indexes sharing a parent
partitioned index must have the same extension dependencies; if they
differ, an error naming both conflicting partition indexes is raised.
The check is implemented by collecting one entry per partition index,
sorting by parent index OID, and comparing adjacent entries in a single
pass.  This is order-independent: the same set of partitions produces
the same decision regardless of the order they are listed in the MERGE
command, and subset mismatches are caught in both directions.

For SPLIT operations, the new partition indexes simply inherit all
extension dependencies from the source partition's index.

The regression tests exercising this feature live under
src/test/modules/test_extensions, where the test_ext3 and test_ext5
extensions are available; core regression tests cannot assume any
particular extension is installed.

Author: Matheus Alcantara <matheusssilv97@gmail.com>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Reported-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Dmitry Koval <d.koval@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/CALdSSPjXtzGM7Uk4fWRwRMXcCczge5uNirPQcYCHKPAWPkp9iQ%40mail.gmail.com
This function writes into a caller-supplied buffer of length
2 * MAXNORMLEN, which should be plenty in real-world cases.
However a malicious affix file could supply an affix long
enough to overrun that.  Defend by just rejecting the match
if it would overrun the buffer.  I also inserted a check of
the input word length against Affix->replen, just to be sure
we won't index off the buffer, though it would be caller error
for that not to be true.

Also make the actual copying steps a bit more readable, and remove
an unnecessary requirement for the whole input word to fit into the
output buffer (even though it always will with the current caller).

The lack of documentation in this code makes my head hurt, so
I also reverse-engineered a basic header comment for CheckAffix.

Reported-by: Xint Code
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/641711.1776792744@sss.pgh.pa.us
Backpatch-through: 14
parse_affentry() and addCompoundAffixFlagValue() each collect fields
from an affix file into working buffers of size BUFSIZ.  They failed
to defend against overlength fields, so that a malicious affix file
could cause a stack smash.  BUFSIZ (typically 8K) is certainly way
longer than any reasonable affix field, but let's fix this while
we're closing holes in this area.

I chose to do this by silently truncating the input before it can
overrun the buffer, using logic comparable to the existing logic in
get_nextfield().  Certainly there's at least as good an argument for
raising an error, but for now let's follow the existing precedent.

Reported-by: Igor Stepansky <igor.stepansky@orca.security>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/864123.1776810909@sss.pgh.pa.us
Backpatch-through: 14
to_char() allocates its output buffer with 8 bytes per formatting
code in the pattern.  If the locale's currency symbol, thousands
separator, or decimal or sign symbol is more than 8 bytes long,
in principle we could overrun the output buffer.  No such locales
exist in the real world, so it seems sufficient to truncate the
symbol if we do see it's too long.

Reported-by: Xint Code
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/638232.1776790821@sss.pgh.pa.us
Backpatch-through: 14
Make sure that function declarations use names that exactly match the
corresponding names from function definitions in a few places.  Most of
these inconsistencies were introduced during Postgres 19 development.

This commit was written with help from clang-tidy, by mechanically
applying the same rules as similar clean-up commits (the earliest such
commit was commit 035ce1f).
Commit 7a1f0f8 optimized the slot verification query but
overlooked cases where all logical replication slots are already
invalidated. In this scenario, the CTE returns no rows, causing the
main query (which used a cross join) to return an empty result even
when invalid slots exist.

This commit fixes this by using a LEFT JOIN with the CTE, ensuring
that slots are properly reported even if the CTE returns no rows.

Author: Lakshmi N <lakshmin.jhs@gmail.com>
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CA+3i_M8eT6j8_cBHkYykV-SXCxbmAxpVSKptjDVq+MFtpT-Paw@mail.gmail.com
The problem report was about setting GUCs in the startup packet for a
physical replication connection. Setting the GUC required an ACL
check, which performed a lookup on pg_parameter_acl.parname. The
catalog cache was hardwired to use DEFAULT_COLLATION_OID for
texteqfast() and texthashfast(), but the database default collation
was uninitialized because it's a physical walsender and never connects
to a database. In versions 18 and later, this resulted in a NULL
pointer dereference, while in version 17 it resulted in an ERROR.

As the comments stated, using DEFAULT_COLLATION_OID was arbitrary
anyway: if the collation actually mattered, it should have used the
column's actual collation. (In the catalog, some text columns are the
default collation and some are "C".)

Fix by using C_COLLATION_OID, which doesn't require any initialization
and is always available. When any deterministic collation will do,
it's best to consistently use the simplest and fastest one, so this is
a good idea anyway.

Another problem was raised in the thread, which this commit doesn't
fix (see second discussion link).

Reported-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/D18AD72A-5004-4EF8-AF80-10732AF677FA@yandex-team.ru
Discussion: https://postgr.es/m/4524ed61a015d3496fc008644dcb999bb31916a7.camel%40j-davis.com
Backpatch-through: 17
@NikolayS NikolayS force-pushed the claude/verify-git-history-PaBip branch from ed6777d to 66e66fd Compare April 22, 2026 17:37
NikolayS and others added 7 commits April 22, 2026 11:19
Generated from full postgres/postgres history (upstream/master, 64,103
commits). Bruce Momjian has 14,078 commits matching the screenshot, but
ranks #2 — Tom Lane leads with 16,755.

https://claude.ai/code/session_01SAWhyKXpDkFpwPiCP3EZ7T
Three stacked views of his 14,078 commits:
- bruce_by_topdir.png: top-level directory touched (doc/ and src/ dominate)
- bruce_by_ext.png: file extension (.c, .sgml, .h, Makefile, ...)
- bruce_by_category.png: subject-based category — release notes, pgindent,
  copyright, translation, typo/comment, docs, fix, revert, merge, other

https://claude.ai/code/session_01SAWhyKXpDkFpwPiCP3EZ7T
The 'other' bucket dropped from 9,070 -> 1,856 by adding rules for TODO-list
entries (Add:/Done:/Update:/Remove:), pg_upgrade, win32/port, autoconf/build,
and broader feature/cleanup/docs verbs (mention, clarify, move, change,
remove, mark, attached-patch era, etc.). Added probe/sample helpers.

https://claude.ai/code/session_01SAWhyKXpDkFpwPiCP3EZ7T
@NikolayS NikolayS force-pushed the claude/verify-git-history-PaBip branch from 66e66fd to 07a05bc Compare April 22, 2026 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.