Add index-only scan counters to index stats by NikolayS · Pull Request #35 · NikolayS/postgres

NikolayS · 2026-04-30T05:55:21Z

Adds cumulative index-only scan counters to pg_stat_all_indexes:

idx_only_scan — number of index-only searches on the index
idx_only_tup_read — index entries returned by index-only scans
idx_only_heap_fetch — index-only scan visibility-map misses requiring heap visits

Implementation notes:

Reuses index scan instrumentation to count IOS searches.
Counts IOS index entries in nodeIndexonlyscan.c alongside index_getnext_tid().
Counts IOS heap fetches at the existing EXPLAIN Heap Fetches hook point.
Adds regression coverage for both all-visible IOS (heap_fetch = 0) and VM-miss IOS (heap_fetch = 1).

Local validation:

ninja -C build -j2
meson test -C build regress/regress --print-errorlogs — 245 subtests passed
git diff --check

Commit 7c64d56 has removed the isolation test providing coverage for lock statistics due to some instability in the CI, where the deadlock timeout may not have enough time to process, preventing the stats data to be updated. These also relied on a set of hardcoded sleeps. This commit switches the test suite to TAP, instead, that uses an injection point with a wait to avoid the sleeps. The injection point is added in ProcSleep(), once we know that the deadlock timeout has fired and that the stats have been updated. Multiple lock patterns are checked, all rely on the same workflow, with two sessions: - session 1 holds a given lock type. - session 2 attaches to the new injection point with the wait action. - session 2 attempts to acquire a lock conflicting with the lock of session 1, waiting for the injection point to be reached. - session 1 releases its lock, session 2 commits. - pg_stat_lock is polled until the counters are updated for the lock type. Bertrand's version of the patch introduced a new routine to BackgroundPsql() to detect the blocked background sessions. I have tweaked the test so as we use the same method as some of the other tests instead, based on some \echo commands. This test has been run multiple times in the CI, all passing, so I'd like to think that this is more stable than the first version attempted. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Co-authored-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/acNTR1lLHwQJ0o+P@ip-10-97-1-34.eu-west-3.compute.internal

If the background worker for processing databases manages to finish before the launcher starts waiting for it, the launcher would treat it erroneously as an error. Fix by ensureing to check result state in this case. Identified on CI and synthetically reproduced during local testing. Also while, make sure to properly lock the shared memory structure before updating tje result state. Author: Daniel Gustafsson <daniel@yesql.seA Reported-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/4fxw37ge47v5baeozla5phymi233hxbcjbwwsfwv3mpg3kyl2z@6jk4nkf6jp4

Postcommit review and buildfarm/CI failures revealed a few issues in the test code which this commit attempts to resolve. These failures are verified using synthetic means. * Wait for launcher exit in enable/disable checksum tests When enabling or disabling data checksums in a test with waiting for an end state (on or off), the test typically want to perform more test against the cluster immediately. Make sure to wait for the launcher to exit in these cases before returning in order to know it can immediately be acted on. This is a more generic way of implementating 0036232. * Refactor injection point tests to use the injection_points test extension. Two injection points added for online checksums were better expressed using the injection_points extension with the test code embedded in datachecksum_state.c. * Make tests less timing dependent and allow transitions to "on" and not just "inprogress-on" in case a test manages to finish before it's checked for state. * When waiting on a blocking background psql keeping a temporary table open, the test first closed the background session abd then the server. This could cause data checksums to manage to get enabled in the brief window between dropping the temporary table and closing the server. Fix by closing the server first before the background session. * Remove a few superfluous duplicate checks and general cleanup of comments as well as making LSN logging consistent. These issues were reported by Andres as well as spotted in the buildfarm and on CI. Author: Daniel Gustafsson <daniel@yesql.se> Reported-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/92F25C14-801E-4198-994D-D83E31FEB0D8@yesql.se

On MSVC Arm, USE_ARMV8_CRC32C is defined, but __builtin_constant_p is not available. Use pg_integer_constant_p and add appropriate guards. There is a similar potential hazard for the x86 path, but for now let's get the buildfarm green. Oversight in commit fbc57f2, per buildfarm member hoatzin.

…ation Previously, during shutdown, walsenders always waited until all pending data was replicated to receivers. This ensures sender and receiver stay in sync after shutdown, which is important for physical replication switchovers, but it can significantly delay shutdown. For example, in logical replication, if apply workers are blocked on locks, walsenders may wait until those locks are released, preventing shutdown from completing for a long time. This commit introduces a new GUC, wal_sender_shutdown_timeout, which specifies the maximum time a walsender waits during shutdown for all pending data to be replicated. When set, shutdown completes once all data is replicated or the timeout expires. A value of -1 (the default) disables the timeout. This can reduce shutdown time when replication is slow or stalled. However, if the timeout is reached, the sender and receiver may be left out of sync, which can be problematic for physical replication switchovers. Author: Andrey Silitskiy <a.silitskiy@postgrespro.ru> Author: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Takamichi Osumi <osumi.takamichi@fujitsu.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Vitaly Davydov <v.davydov@postgrespro.ru> Reviewed-by: Ronan Dunklau <ronan@dunklau.fr> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Japin Li <japinli@hotmail.com> Reviewed-by: Fujii Masao <masao.fujii@gmail.com> Discussion: https://postgr.es/m/TYAPR01MB586668E50FC2447AD7F92491F5E89@TYAPR01MB5866.jpnprd01.prod.outlook.com

When determining if it is safe to use an expression as a grouping key for partial aggregation, eager aggregation relies on the B-tree equalimage support function to ensure that equality implies image equality. Previously, the code incorrectly passed the default collation of the expression's data type to the equalimage procedure, rather than the expression's actual collation. As a result, if a column used a non-deterministic collation but the base type's default collation was deterministic, eager aggregation would incorrectly assume that the column was safe for byte-level grouping. This could cause rows to be prematurely grouped and subsequently discarded by strict join conditions, resulting in incorrect query results. This patch fixes the issue by passing the expression's actual collation to the equalimage procedure. Author: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com> Discussion: https://postgr.es/m/CAMbWs48A53PY1Y4zoj7YhxPww9fO1hfnbdntKfA855zpXfVFRA@mail.gmail.com

Pushing aggregates containing volatile functions below a join can violate volatility semantics by changing the number of times the function is executed. Here we check the Aggref nodes in the targetlist and havingQual for volatile functions and disable eager aggregation when such functions are present. Author: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com> Discussion: https://postgr.es/m/CAMbWs48A53PY1Y4zoj7YhxPww9fO1hfnbdntKfA855zpXfVFRA@mail.gmail.com

…ion test Previously the stats.sql regression test used conditions like "datname = (SELECT current_database())" to check the current database name. The subquery is unnecessary, so this commit simplifies these expressions to "datname = current_database()". Author: Chao Li <lic@highgo.com> Reviewed-by: Fujii Masao <masao.fujii@gmail.com> Discussion: https://postgr.es/m/A1535A8F-65AF-4C3D-ACBE-25891CB5D38B@gmail.com

Alexander Lakhin has noticed that it can be possible on machines with slow storage to have the spawned workers be stuck in initialize_worker_spi(), before they reach their main loop. Waiting for a flush to happen would block the interrupt attempts done by the database commands, causing the test to fail on timeout once the number of interrupt attempts is reached in CountOtherDBBackends(). This commit switches the test to wait for the spawned bgworkers to reach their main loops before attempting the database commands that would trigger the interrupts, napping for a time larger than the default, with worker_spi.naptime set at 10 minutes. Another thing that could be attempted is to enforce a larger number of tries in CountOtherDBBackends(), if what is done here is not enough. Let's see first if what this commit does is enough for the buildfarm members widowbird and jay. Analyzed-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/f913fba1-da59-404c-9eb3-07c7304be637@gmail.com

Previously, one LWLock was used for each lock type, adding complexity without an observable performance benefit as data is gathered only for paths involving lock waits, at least currently. This commit replaces the per-type set of LWLocks with a single LWLock protecting the stats data of all the lock types, like the stats kinds for SLRU or WAL. A good chunk of the callbacks get simpler thanks to this change. The previous approach also had one bug in the flush callback when nowait was called with "true": a backend iterating over all entries could successfully flush some entries while skipping others due to contention, then unconditionally reset the pending data. This would cause some stats data loss. Oversight in 4019f72. Reported-by: Tomas Vondra <tomas@vondra.me> Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/1af63e6d-16d5-4d5b-9b03-11472ef1adf9@vondra.me

This module allows plan advice strings to be provided automatically from an in-memory advice stash. Advice stashes are stored in dynamic shared memory and must be recreated and repopulated after a server restart. If pg_stash_advice.stash_name is set to the name of an advice stash, and if query identifiers are enabled, the query identifier for each query will be looked up in the advice stash and the associated advice string, if any, will be used each time that query is planned. Reviewed-by: Lukas Fittl <lukas@fittl.com> Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com> Reviewed-by: David G. Johnston <david.g.johnston@gmail.com> Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com> Discussion: http://postgr.es/m/CA+TgmoaeNuHXQ60P3ZZqJLrSjP3L1KYokW9kPfGbWDyt+1t=Ng@mail.gmail.com

Some compilers didn't like the empty initializer when compiled without USE_INJECTION_POINTS. Per buildfarm member 'drongo', using Visual Studio 2019. Author: Michael Paquier <michael@paquier.xyz> Discussion: https://www.postgresql.org/message-id/adNHcBVJO5gIOp1l@paquier.xyz

When freeing pending_shmem_requests we should also free the ->options. Author: Aleksander Alekseev <aleksander@tigerdata.com> Discussion: https://www.postgresql.org/message-id/CAJ7c6TN9tp8MTc0WXM0zfSWqjfBqU8gpe+o5KqHB1-cQ7409Kw@mail.gmail.com

Child processes do not need the postmaster's working memory context and normally release it at the start of their main entry point. However, the slotsync worker forgot to do so. This commit makes the slotsync worker release the postmaster's working memory context at startup, preventing unintended use. Author: Fujii Masao <masao.fujii@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Tiancheng Ge <getiancheng_2012@163.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CAHGQGwHO05JaUpgKF8FBDmPdBUJsK22axRRcgmAUc2Jyi8OK8g@mail.gmail.com

This commit updates 011_lock_stats.pl to verify log_lock_waits behavior. The tests check that messages are emitted both when a wait occurs and when the lock is acquired, and that the "still waiting for" message is logged exactly once per wait, even if the backend wakes up during the wait. The latter covers the behavior introduced by commit fd6ecbf. Author: Hüseyin Demir <huseyin.d3r@gmail.com> Co-authored-by: Fujii Masao <masao.fujii@gmail.com> Discussion: https://postgr.es/m/CAB5wL7YB1my9W5k5i=SY+=sTjeozyJ0YkvGXrVfeDNzuRkoTPg@mail.gmail.com

Previously, this logic was embedded within SplitIdentifierString, SplitDirectoriesString, and SplitGUCList. Factoring it out saves a bit of duplicated code, and also makes it available to extensions that might want to do similar things without necessarily wanting to do exactly the same thing. Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com> Reviewed-by: Lukas Fittl <lukas@fittl.com> Discussion: http://postgr.es/m/CA+Tgmob-0W8306mvrJX5Urtqt1AAasu8pi4yLrZ1XfwZU-Uj1w@mail.gmail.com

The restructuring in commit 53b8ca6 revealed an interesting corner case: if a table needs vacuuming for wraparound prevention and autovacuum is disabled for it, we might still choose to analyze it. Research seems to indicate this was an accidental addition by commit 48188e1, and further discussion indicates there is consensus that it is unnecessary and can be removed. Reviewed-by: Robert Treat <rob@xzilla.net> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Sami Imseih <samimseih@gmail.com> Reviewed-by: Shinya Kato <shinya11.kato@gmail.com> Discussion: https://postgr.es/m/adB9nSsm_S0D9708%40nathan

It would be useful to be able to tell auto_explain to set a custom EXPLAIN option, but it would be bad if it tried to do so and the option name or value wasn't valid, because then every query would fail with a complaint about the EXPLAIN option. So add a guc_check_handler that auto_explain will be able to use to only try to set option name/value/type combinations that have been determined to be legal, and to emit useful messages about ones that aren't. Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com> Reviewed-by: Lukas Fittl <lukas@fittl.com> Discussion: http://postgr.es/m/CA+Tgmob-0W8306mvrJX5Urtqt1AAasu8pi4yLrZ1XfwZU-Uj1w@mail.gmail.com

This code missed the need to update the combined state's nullbitmap if state1 already had a bitmap but state2 didn't. We need to extend the existing bitmap with 1's but didn't. This could result in wrong output from a parallelized array_agg(anyarray) calculation, if the input has a mix of null and non-null elements. The errors depended on timing of the parallel workers, and therefore would vary from one run to another. Also install guards against integer overflow when calculating the combined object's sizes, and make some trivial cosmetic improvements. Author: Dmytro Astapov <dastapov@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAFQUnFj2pQ1HbGp69+w2fKqARSfGhAi9UOb+JjyExp7kx3gsqA@mail.gmail.com Backpatch-through: 16

contrib/pg_stash_advice and src/test/modules/test_shmem missed these, leading to complaints from git after an in-tree check-world run. Use our standard boilerplate list of ignorable subdirectories, although the two modules presently create different subsets of that.

These columns haven't been computed yet when the filtering happens (since we've not written the candidate tuple into the table); so any check on them is wrong or useless. Worse, since aa606b9 such a reference results in an access off the end of a TupleDesc, potentially causing a phony "generated columns are not supported in COPY FROM WHERE conditions" error; and since c98ad08 it throws an Assert instead. Actually we could allow tableoid, which has been set to the OID of the table named as the COPY target. However, plausible uses for tests of tableoid would involve a partitioned target table, and the user would wish it to read as the OID of the destination partition. There has been some discussion of changing things to make it work like that, but pending that happening we should just disallow tableoid along with other system columns. It seems best though to install this prohibition only in HEAD. In the back branches we'll just guard the unsafe TupleDesc access, and people will keep getting whatever semantics they got before. Reported-by: Alexander Lakhin <exclusion@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/6f435023-8ab6-47c2-ba07-035d0c4212f9@gmail.com

CLUSTER is no longer the favored way to invoke this functionality, and the code is about to shift its focus to the REPACK more ambitiously. Rename the file to avoid leaving an unnecessary historical artifact around. Author: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/202603271635.owyhm7btgoic@alvherre.pgsql

Previously, autovacuum always disabled parallel vacuum regardless of the table's index count or configuration. This commit enables autovacuum workers to use parallel index vacuuming and index cleanup, using the same parallel vacuum infrastructure as manual VACUUM. Two new configuration options control the feature. The GUC autovacuum_max_parallel_workers sets the maximum number of parallel workers a single autovacuum worker may launch; it defaults to 0, preserving existing behavior unless explicitly enabled. The per-table storage parameter autovacuum_parallel_workers provides per-table limits. A value of 0 disables parallel vacuum for the table, a positive value caps the worker count (still bounded by the GUC), and -1 (the default) defers to the GUC. To handle cases where autovacuum workers receive a SIGHUP and update their cost-based vacuum delay parameters mid-operation, a new propagation mechanism is added to vacuumparallel.c. The leader stores its effective cost parameters in a DSM segment. Parallel vacuum workers poll for changes in vacuum_delay_point(); if an update is detected, they apply the new values locally via VacuumUpdateCosts(). A new test module, src/test/modules/test_autovacuum, is added to verify that parallel autovacuum workers are correctly launched and that cost-parameter updates are propagated as expected. The patch was originally proposed by Maxim Orlov, but the implementation has undergone significant architectural changes since then during the review process. Author: Daniil Davydov <3danissimo@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Sami Imseih <samimseih@gmail.com> Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: zengman <zengman@halodbtech.com> Discussion: https://postgr.es/m/CACG=ezZOrNsuLoETLD1gAswZMuH2nGGq7Ogcc0QOE5hhWaw=cw@mail.gmail.com

transformCreateSchemaStmtElements has always believed that it is supposed to re-order the subcommands of CREATE SCHEMA into a safe execution order. However, it is nowhere near being capable of doing that correctly. Nor is there reason to think that it ever will be, or that that is a well-defined requirement. (The SQL standard does say that it should be possible to do foreign-key forward references within CREATE SCHEMA, but it's not clear that the text requires anything more than that.) Moreover, the problem will get worse as we add more subcommand types. Let's just drop the whole idea and execute the commands in the order given, which seems like a much less astonishment-prone definition anyway. The foreign-key issue will be handled in a follow-up patch. This will result in a release-note-worthy incompatibility, which is that forward references like CREATE SCHEMA myschema CREATE VIEW myview AS SELECT * FROM mytable CREATE TABLE mytable (...); used to work and no longer will. Considering how many closely related variants never worked, this isn't much of a loss. Along the way, pass down a ParseState so that we can provide an error cursor for "wrong schema name" and related errors, and fix transformCreateSchemaStmtElements so that it doesn't scribble on the parsetree passed to it. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Jian He <jian.universality@gmail.com> Discussion: https://postgr.es/m/1075425.1732993688@sss.pgh.pa.us

The previous patch simplified CREATE SCHEMA's behavior to "execute all subcommands in the order they are written". However, that's a bit too simple, as the spec clearly requires forward references in foreign key constraint clauses to work, see feature F311-01. (Most other SQL implementations seem to read more into the spec than that, but it's not clear that there's justification for more in the text, and this is the only case that doesn't introduce unresolvable issues.) We never implemented that before, but let's do so now. To fix it, transform FOREIGN KEY clauses into ALTER TABLE ... ADD FOREIGN KEY commands and append them to the end of the CREATE SCHEMA's subcommand list. This works because the foreign key constraints are independent and don't affect any other DDL that might be in CREATE SCHEMA. For simplicity, we do this for all FOREIGN KEY clauses even if they would have worked where they were. Author: Jian He <jian.universality@gmail.com> Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/1075425.1732993688@sss.pgh.pa.us

Having rejected the principle that we should know how to re-order the sub-commands of CREATE SCHEMA, there is not really anything except a little coding to stop us from supporting more object types. This patch adds support for creating functions (including procedures and aggregates), operators, types (including domains), collations, and text search objects. SQL:2021 specifies that we should allow functions, procedures, types, domains, and collations, so this moves us a great deal closer to full SQL compatibility of CREATE SCHEMA. What remains missing from their list are casts, transforms, roles, and some object types we don't support yet (e.g. CREATE CHARACTER SET). Supporting casts or transforms would be problematic because they don't have names at all, let alone schema-qualified names, so it'd be quite a stretch to say that they belong to a schema. Roles likewise are not schema-qualified, plus they are global to a cluster, making it even less reasonable to consider them as belonging to a schema. So I don't see us trying to complete the list. User-defined aggregates and operators are outside the spec's ken, as are text search objects, so adding them does not do anything for spec compatibility. But they go along with these other object types, plus it takes no additional code to support them since they are represented as DefineStmts like some variants of CREATE TYPE. It would indeed take some effort to reject them. Author: Kirill Reshke <reshkekirill@gmail.com> Author: Jian He <jian.universality@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CALdSSPh4jUSDsWu3K58hjO60wnTRR0DuO4CKRcwa8EVuOSfXxg@mail.gmail.com

The associated value should look like something that could be part of an EXPLAIN options list, but restricted to EXPLAIN options added by extensions. For example, if pg_overexplain is loaded, you could set auto_explain.log_extension_options = 'DEBUG, RANGE_TABLE'. You can also specify arguments to these options in the same manner as normal e.g. 'DEBUG 1, RANGE_TABLE false'. Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com> Reviewed-by: Lukas Fittl <lukas@fittl.com> Discussion: http://postgr.es/m/CA+Tgmob-0W8306mvrJX5Urtqt1AAasu8pi4yLrZ1XfwZU-Uj1w@mail.gmail.com

This function is a thin wrapper around relation_needs_vacanalyze() that handles fetching and freeing the pgstat entry for the table. Since all callers of relation_needs_vacanalyze() do that anyway, we can teach that function to fetch/free the pgstat entry and use it instead. Suggested-by: Álvaro Herrera <alvherre@kurilemu.de> Author: Sami Imseih <samimseih@gmail.com> Co-authored-by: Nathan Bossart <nathandbossart@gmail.com> Discussion: https://postgr.es/m/CAA5RZ0s4xjMrB-VAnLccC7kY8d0-4806-Lsac-czJsdA1LXtAw%40mail.gmail.com

Use TupleDescInitBuiltinEntry instead of TupleDescInitEntry when building the result tuple descriptor for the WAIT FOR command. This avoids a syscache access that could re-establish a catalog snapshot after we've explicitly released all snapshots before the wait. Discussion: https://postgr.es/m/CABPTF7U%2BSUnJX_woQYGe%3D%3DR9Oz%2B-V6X0VO2stBLPGfJmH_LEhw%40mail.gmail.com Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>

When the standby is passed as a PostgreSQL::Test::Cluster instance, use the WAIT FOR LSN command on the standby server to implement wait_for_catchup() for replay, write, and flush modes. This is more efficient than polling pg_stat_replication on the upstream, as the WAIT FOR LSN command uses a latch-based wakeup mechanism. The optimization applies when: - The standby is passed as a Cluster object (not just a name string) - The mode is 'replay', 'write', or 'flush' (not 'sent') Rather than pre-checking pg_is_in_recovery() on the standby (which would add an extra round-trip on every call), we issue WAIT FOR LSN directly and handle the 'not in recovery' result as a signal to fall back to polling. For 'sent' mode, when the standby is passed as a string (e.g., a subscription name for logical replication), when the standby has been promoted, or when WAIT FOR LSN is interrupted by a recovery conflict, the function falls back to the original polling-based approach using pg_stat_replication on the upstream. The recovery conflict fallback is necessary because some conflicts are unavoidable - for example, ResolveRecoveryConflictWithTablespace() kills all backends unconditionally, regardless of what they are doing. The recovery conflict detection matches the English error message "conflict with recovery", which is reliable because the test suite runs with LC_MESSAGES=C. Discussion: https://postgr.es/m/CABPTF7UiArgW-sXj9CNwRzUhYOQrevLzkYcgBydmX5oDes1sjg%40mail.gmail.com Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@kurilemu.de>

pg_test_timing reports timing differences in nanoseconds in master, and in microseconds in v14 through v18, but previously the backward-clock warning incorrectly labeled the value as milliseconds. This commit fixes the warning message to use "ns" in master and "us" in v14 through v18, matching the actual unit being reported. Backpatch to all supported versions. Author: Chao Li <lic@highgo.com> Reviewed-by: Lukas Fittl <lukas@fittl.com> Reviewed-by: Xiaopeng Wang <wxp_728@163.com> Reviewed-by: Fujii Masao <masao.fujii@gmail.com> Discussion: https://postgr.es/m/F780CEEB-A237-4302-9F55-60E9D8B6533D@gmail.com Backpatch-through: 14

ExecEvalHashedScalarArrayOp(), when using a strict equality function, performs a short-circuit when looking up NULL values. When the function is non-strict, the code incorrectly looked up the hash table for a zero-valued Datum, which could have resulted in an accidental true return if the hash table contained zero valued Datum, or could result in a crash for non-byval types. Here we fix this by adding an extra step when we build the hash table to check what the result of a NULL lookup would be. This requires looping over the array and checking what the non-hashed version of the code would do. We cache the results of that in the expression so that we can reuse the result any time we're asked to search for a NULL value. It's important to note that non-strict equality functions are free to treat any NULL value as equal to any non-NULL value. For example, someone may wish to design a type that treats an empty string and NULL as equal. All built-in types have strict equality functions, so this could affect custom / user-defined types. Author: Chengpeng Yan <chengpeng_yan@outlook.com> Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: ChangAo Chen <cca5507@qq.com> Discussion: https://postgr.es/m/A16187AE-2359-4265-9F5E-71D015EC2B2D@outlook.com Backpatch-through: 14

Commit 0b096e3 changed pg_test_timing to measure timing differences in nanoseconds instead of microseconds, but the resulting deltas continued to be stored in int32. That can overflow for large gaps (for example, values greater than about 2.14 seconds in nanoseconds), leading to truncation or incorrect output. This commit fixes the issue by storing measured timing deltas in int64. This prevents overflow for large values and better matches nanosecond-resolution measurements. Author: Chao Li <lic@highgo.com> Reviewed-by: Lukas Fittl <lukas@fittl.com> Reviewed-by: Xiaopeng Wang <wxp_728@163.com> Reviewed-by: Fujii Masao <masao.fujii@gmail.com> Discussion: https://postgr.es/m/F780CEEB-A237-4302-9F55-60E9D8B6533D@gmail.com

generate_queries_for_path_pattern_recurse() and generate_setop_from_pathqueries() are recursive functions. For a property graph with hundreds of tables, a graph pattern with a handful element patterns can cause stack overflow. Fix it by calling check_stack_depth() at the beginning of these functions. Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://www.postgresql.org/message-id/CAHg+QDfgK0xddH8f3eAb+UVn7sBDOnv8RvM6OkP4HtHAt6aD7w@mail.gmail.com

Reported-by: Lakshmi N <lakshmin.jhs@gmail.com> Author: Lakshmi N <lakshmin.jhs@gmail.com> Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://www.postgresql.org/message-id/CA+3i_M9gpUGjH-BkJk=UFjK16jq9fEQHpmZ1cxpJO+xM4hWC+A@mail.gmail.com

GRAPH_TABLE clause is converted into a rangetable entry, which is ignored by assign_query_collations(). Hence we assign collations while transforming its parts. But expressions in COLUMNS clause missed that treatment, so fix that. While at it, also add comments about collation assignment to the parts of GRAPH_TABLE clause, and also fix a small grammar issue. Reported-by: Satyanarayana Narlapuram <satyanarlapuram@gmail.com> Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com> Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://www.postgresql.org/message-id/CAHg+QDc4aaiufYSgrwMMPMMRTPtQ66SghcrPFbWJFZMqNaG+BA@mail.gmail.com

Expressions in GRAPH_TABLE COLUMNS list may have lateral references. get_rule_expr() requires lateral namespaces to deparse such references. get_from_clause_item() does not pass them when processing the expressions in COLUMNS list causing ERROR "bogus varlevelsup: 0 offset 0". Fix get_from_clause_item() to pass input deparse_context containing lateral namespaces to get_rule_expr() instead of the dummy context. Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAHg%2BQDcLVa2iBnggkHxY4itZbXtDMfsYHEjnCUYe9hNbnxDi-w%40mail.gmail.com

We need to create top-level targets to run targets with the ninja command like `ninja <target_name>`. Some targets (man, html, ...) have the same target name on both top-level and custom target. This creates a confusion for the meson build: $ meson compile -C build html ``` ERROR: Can't invoke target `html`: ambiguous name. Add target type and/or path: - ./doc/src/sgml/html:custom - ./doc/src/sgml/html:alias ``` Solve that problem by adding '-custom' suffix to these problematic targets' custom target names. Top-level targets can be called with both meson and ninja now: $ meson compile -C build html $ ninja -C build html Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Suggested-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/5508e572-79ae-4b20-84d0-010a66d077f2%40eisentraut.org

British Columbia (America/Vancouver) moved to permanent UTC-07 on 2026-03-09, which will affect their clocks beginning on 2026-11-01. For lack of any clarity on the point, assume their TZ abbreviation will be MST from that time forward. Moldova (Europe/Chisinau) has followed EU DST transition times since 2022. Backpatch-through: 14

Use PRId64 instead.

Quote pg_hosts.conf fields derived from the build directory, since hba.c:next_token() treats a comma as a token separator. Commit 4f43302 introduced pg_hosts.conf and this test. A build directory name containing a comma worked before that commit. A build directory name containing a quote character has not worked, so don't handle that. Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/20260426213252.7a@rfd.leadboat.com

These are old leaks, that can pile up if a WAL receiver stays alive, waiting for new WAL data after the sender has switched to a new timeline. While this is technically a bug, the impact is minimal and would only become noticeable if the WAL sender handles a lot of timeline switches, so no backpatch is done. Note that in most cases, primary_conninfo would be updated in a standby to point to a new sender, meaning a restart of the WAL receiver. Let's be clean on HEAD, though. Author: DaeMyung Kang <charsyam@gmail.com> Discussion: https://postgr.es/m/20260426170100.847923-1-charsyam@gmail.com Discussion: https://postgr.es/m/20260426170219.849330-1-charsyam@gmail.com

remove_self_join_rel() called adjust_relid_set() on all_result_relids and leaf_result_relids but threw away the return value. Since adjust_relid_set() returns a freshly-built Relids and does not modify the input in place, the calls did nothing. This has been the case since the SJE feature went in (commit fc069a3). There has been no observable misbehavior, because the relid being passed is guaranteed not to be a member of either set. At the point remove_self_join_rel() runs, those sets contain only resultRelation; inheritance children have not been added yet, as that happens later in query_planner(), in expand_single_inheritance_child() called from add_other_rels_to_query(). And remove_self_joins_recurse() rejects parse->resultRelation as an SJE candidate to preserve the EvalPlanQual mechanism. Even with the result assigned, the calls would be no-ops in practice. Rather than make the calls do the cleanup they pretend to do, replace them with assertions of the invariant. Any future loosening of the SJE candidate filter -- for instance to allow eliminating a result relation under provable conditions -- will trip the assertion and force whoever does it to revisit this code. Additionally, decorate adjust_relid_set() with pg_nodiscard so that any future accidental discard of its return value is caught at compile time. Author: Richard Guo <guofenglinux@gmail.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAMbWs49fYQcqJfJ_Gtn8r1GFNoYtb1=2AUab4ieuqY4Zid9ocQ@mail.gmail.com

Author: Peter Smith <smithpb2250@gmail.com> Discussion: https://postgr.es/m/CAHut+PuvY_wYLPJ4DTs7NE9Lu2ty4d-OgZAOJC-NvCM=2wwcQQ@mail.gmail.com Backpatch-through: 14

Previously, these test cases would give internal errors or crash. The fix is to add some missing fields of ForPortionOfExpr to expression_tree_walker. Author: jian he <jian.universality@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com> Discussion: https://postgr.es/m/CACJufxHs1Hs00EqsZ4NbuAjmYzMzjJyP1sAj12Ne=cBsEVmQOA@mail.gmail.com

Similarly to logical replication, REPACK CONCURRENTLY needs to ability to reliably locate a tuple based on an identity. A replica identity index is okay. Primary keys normally also are, except when they are deferrable, because a tuple being modified might not yet be indexed, causing REPACK to fail. Change the REPACK CONCURRENTLY code to use GetRelationIdentityOrPK(), similar to what the logical replication code does. (Though we don't yet support locating tuples based on arbitrary indexes for replica identity FULL.) While at it, add a few more test cases for situations that aren't supported by REPACK, to improve coverage. Author: Chao Li <lic@highgo.com> Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: Antonin Houska <ah@cybertec.at> Reviewed-by: Yuchen Li <liyuchen_xyz@163.com> Discussion: https://postgr.es/m/10DD5E13-B45D-44F1-BE08-C63E00ABCAC0@gmail.com

Use BoolGetDatum() instead of Int32GetDatum() when storing the boolean subretentionactive column in pg_subscription. This was an oversight in a850be2. Author: Lakshmi N <lakshmin.jhs@gmail.com> Reviewed-by: Nisha Moond <nisha.moond412@gmail.com> Discussion: https://postgr.es/m/CA+3i_M98-XjE-_fw0p+8xOnw64y2_YLtJfcwvCfsVMn-z2ZjGg@mail.gmail.com

When a subscription has retain_dead_tuples enabled and maxretention is zero (unlimited), adjust_xid_advance_interval() mistakenly caps xid_advance_interval to zero. This zero interval forces get_candidate_xid() to evaluate TimestampDifferenceExceeds() as always true, causing the apply worker to call GetOldestActiveTransactionId() for every WAL message. This leads to unnecessary ProcArrayLock acquisitions. Fix this by only capping the interval when maxretention > 0, allowing the exponential back-off to function properly. Author: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Nisha Moond <nisha.moond412@gmail.com> Discussion: https://postgr.es/m/CAHg+QDdKVnCLHot=AcoPpEiSyDzGz7wGYjAFHVOw57oDtmUDWQ@mail.gmail.com

Oversight in commit 9303d62. Author: Aleksander Alekseev <aleksander@tigerdata.com> Discussion: https://postgr.es/m/CAJ7c6TOsKmmgyA6EwxKVsNeHFHrWXYdgZivgjo_ujf890BpeeA@mail.gmail.com

Do minor comment fixes and remove implicit cast to Datum. While here, let's prefer crashing instead of entering an infinite loop in case of future programming mistakes when computing next_level, suggested by ChangAo Chen. Discussion: https://postgr.es/m/tencent_49E3F11E74D8A584A2144ED532A490CBC40A@qq.com

btestimateparallelscan neglected to add btps_arrElems[] space overhead for skip array scan keys that were later output by nbtree preprocessing. Skip arrays don't actually need to use this space, but a scan with a subsequent SAOP array will need to subscript btps_arrElems[] using a simple so->arrayKeys[]-wise offset. so->arrayKeys[] has entries for both kinds of arrays. As a result of this oversight, it was possible for an index scan with a skip array and a lower-order SAOP array to write past the allocated shared memory boundary when storing the SAOP array's cur_elem. In practice the problem seems to be limited to scans with many skipped index columns, since our general approach to estimating the amount of shared memory that will be required is fairly conservative. To fix, have btestimateparallelscan request an extra sizeof(int) space for key columns that might require a skip array later on. Oversight in commit 92fe23d, which added the nbtree skip scan optimization. Author: Siddharth Kothari <sidkot@google.com> Discussion: https://postgr.es/m/CAGCUe0Lwk3C0qdkBa+OLpYc7yXwW=pbaz8Sju4xMXEQAmyp+5g@mail.gmail.com Backpatch-through: 18

The regression tests for pg_get_role_ddl(), pg_get_database_ddl(), and pg_get_tablespace_ddl() created databases and tablespaces, which are heavyweight operations. As noted by Andres Freund, this is wasteful in the core regression suite which gets run repeatedly. Convert the three test files (role_ddl.sql, database_ddl.sql, tablespace_ddl.sql) into a single TAP test that runs once, covering all the same functionality: basic DDL generation, pretty-printing, option handling, error cases, permission checks, and edge cases like quoted names and role memberships. Discussion: https://postgr.es/m/5c67dc79-909a-4e17-8606-6686667da6c6@dunslane.net

The tests introduced in c529ee3 are timezone sensitive. Pin the cluster's timezone to UTC at init time so timestamptz output is deterministic regardless of the host's local timezone.

TidStoreSetBlockOffsets() requires its offsets array to be strictly ascending and asserts this precondition. In test_tidstore, we were passing random offset numbers deduplicated by a DISTINCT clause in an array_agg() call directly to the do_set_block_offsets() test harness. However, DISTINCT without an ORDER BY clause does not guarantee sorted results according to the SQL standard. Fix this by sorting the offsets in-place inside do_set_block_offsets() before calling TidStoreSetBlockOffsets(). While this assertion failure is not observed during regular regression tests because they use queries simple enough that the optimizer consistently chooses plans yielding sorted results, it makes sense to stabilize the test. The failure could theoretically occur depending on the optimizer's plan choice, and has been reported when experimenting with certain third-party extensions. Backpatch to v17, where test_tidstore was introduced, to ensure extension development on stable branches does not hit this assertion. Reported-by: Andrei Lepikhov <lepihov@gmail.com> Author: Andrei Lepikhov <lepihov@gmail.com> Discussion: https://postgr.es/m/b97f1850-fc7b-43c4-9b04-4e97bb9e7dc0@gmail.com Backpatch-through: 17

After a recent macOS update, building Postgres produces warnings that look like this: ranlib: warning: 'libpgport_shlib.a(pg_cpu_x86.c.o)' has no symbols ranlib: warning: 'libpgport_shlib.a(pg_popcount_x86.c.o)' has no symbols To fix, add a dummy symbol to files that may otherwise have none. Per project policy, this is a candidate for back-patching into out-of-support branches: it suppresses annoying compiler warnings but changes no behavior. Reported-by: Zhang Mingli <zmlpostgres@gmail.com> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/229aaaf3-f529-44ed-8e50-00cb6909af21%40Spark Backpatch-through: 13

"lock" a values is supported since 4019f72, but the error message of the function used when specifying an incorrect value forgot about it. Author: Maksim Logvinenko <logvinenko-ms@yandex.ru> Discussion: https://postgr.es/m/433431777389005@mail.yandex.ru

This reverts portions of commit 6dcfac9, which is wrong in trying to use a *GetDatum() that matches with the C types of the values read. *GetDatum() should match with the output argument types of the SQL functions. The portions of 6dcfac9 that are right regarding this rule are: - gistget.c, where the GiST support functions use DatumGetUInt16() to retrieve the strategy number. - The BRIN code for strategynum, used in syscache lookups. The adjustments done in this commit are for pageinspect, pg_buffercache and pg_lock_status(). While double-checking the whole state of the tree regarding non-matching pairs of DatumGet*() and *GetDatum(), I have found much more code paths that are incorrect, unrelated to 6dcfac9. These may be adjusted in the future, in a different patch (perhaps not for v19, as we are already past feature freeze). Reported-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/97f9375a-be61-4272-a44d-408337fe8fa6@eisentraut.org Discussion: https://postgr.es/m/CAJ7c6TMcGu8qmRe1gZfJ-gOzVnZq-t=fwn-UuyStx1w6ZyydMw@mail.gmail.com

michaelpq and others added 30 commits April 6, 2026 08:51

petere and others added 29 commits April 23, 2026 21:52

Add missing source files to several nls.mk

aa27a33

Don't use INT64_FORMAT in translatable string

7d7e58f

Use PRId64 instead.

doc: Fix grammar in some logical replication pages

31b9d90

Author: Peter Smith <smithpb2250@gmail.com> Discussion: https://postgr.es/m/CAHut+PuvY_wYLPJ4DTs7NE9Lu2ty4d-OgZAOJC-NvCM=2wwcQQ@mail.gmail.com Backpatch-through: 14

Remove unused ByteaSortSupport.abbreviate field

a0302ea

Oversight in commit 9303d62. Author: Aleksander Alekseev <aleksander@tigerdata.com> Discussion: https://postgr.es/m/CAJ7c6TOsKmmgyA6EwxKVsNeHFHrWXYdgZivgjo_ujf890BpeeA@mail.gmail.com

Fix timezone dependence in test_misc/012_ddlutils.pl

df1bac4

The tests introduced in c529ee3 are timezone sensitive. Pin the cluster's timezone to UTC at init time so timestamptz output is deterministic regardless of the host's local timezone.

Add index-only scan counters to index stats

1574dfa

NikolayS force-pushed the master branch from 34be85f to 1f62dbf Compare May 17, 2026 07:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add index-only scan counters to index stats#35

Add index-only scan counters to index stats#35
NikolayS wants to merge 2752 commits into
masterfrom
ios-index-stats

NikolayS commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

NikolayS commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants