Skip to content

Comments

K8SPG-911: Add pg_tde support#1440

Open
egegunes wants to merge 2 commits intomainfrom
K8SPG-911
Open

K8SPG-911: Add pg_tde support#1440
egegunes wants to merge 2 commits intomainfrom
K8SPG-911

Conversation

@egegunes
Copy link
Contributor

@egegunes egegunes commented Feb 12, 2026

CHANGE DESCRIPTION

This commit adds native pg_tde extension support into operator.

This commit only adds Vault KMS support for pg_tde. KMIP support will be added in future releases.

When pg_tde is enabled and Vault configuration is provided, the operator:

  • appends pg_tde into shared_preload_libraries,
  • mounts Vault token and CA secrets into database containers,
  • runs CREATE EXTENSION in all databases,
  • creates Vault provider by running pg_tde_add_global_key_provider_vault_v2,
  • create a global key by running pg_tde_create_key_using_global_key_provider,
  • sets the default key by running pg_tde_set_default_key_using_global_key_provider.

Example configuration

pg_tde:
  enabled: true
  vault:
    host: https://vault-service.vault-service.svc:8200
    mountPath: tde
    tokenSecret:
      name: vault-secret
      key: token
    caSecret:
      name: vault-secret
      key: ca.crt

Note that:

  • Mount path needs to be a KV v2 storage engine.
  • caSecret is optional and can be omitted if you want to use http. But in my testing I couldn't manage the make vault work without TLS. It responds with HTTP 405 if I disable TLS in vault.
  • tokenSecret and caSecret can be the same secret or different. Operator doesn't assume anything about the contents of the secrets since you'll need to set secret keys in cr.yaml yourself.
  • Using a non-root token requires more configuration. Check out pg_tde docs for that. But don't forget to add these in the Vault policy:
path "sys/internal/ui/mounts/*" {
      capabilities = ["read"]
}
    
path "sys/mounts/*" {
      capabilities = ["read"]
}

API changes

pg_tde requires more configuration options than other extensions operator supports. This required us make some changes in the extensions API. With these changes, 'spec.extensions.builtin' section is deprecated and all builtin extensions are moved to 'spec.extensions.' (i.e. 'spec.extensions.pg_stat_monitor'). Right now extensions can be enabled/disabled with the old and the new method. If two methods are used at the same time, 'spec.extensions.builtin' takes precedence.

Status changes

A hash will be calculated using pg_tde configuration provided by user. Operator uses this hash to understand if config is changed and it should reconfigure pg_tde. The hash can be found in status.pgTDERevision field of PostgresCluster object. This hash will be removed when pg_tde is disabled.

Operator also communicates the status of pg_tde with conditions. The condition with type=PGTDEEnabled can be found in both PerconaPGCluster and PostgresCluster statuses.

Disabling pg_tde

Disabling pg_tde is more complex than other extensions:

  • First of all any encrypted objects must be dropped before disabling. Otherwise DROP EXTENSION will fail with a descriptive error message. Operator won't drop anything, user needs to do this manually.
  • The extension needs to be disabled in two steps:
    1. First set pg_tde.enabled=false without removing the vault section. Operator will drop the extension and restart the pods.
    2. Then you can remove pg_tde.vault. Database pods will be restarted again to remove secret mounts from containers.
  • It's recommended to run CHECKPOINT before removing pg_tde.vault. Even though extension is dropped, Postgres might still try to use encrypted objects during recovery after restart and it might try to access token secret. CHECKPOINT helps you prevent this failure case.

Deleting and recreating clusters

If cluster with pg_tde enabled is deleted but PVCs are retained, on recreation you'll see some errors about pg_tde in operator logs. They happen because the vault provider and/or global key already exists. Operator will handle these errors gracefully and configure pg_tde. Same thing applies when pg_tde is disabled and re-enabled. Since both vault provider and global key already exists, operator will handle "already exists" errors and configure pg_tde.

The global key name is determined by cluster's .metadata.uid. For example 'global-master-key-ad19534a-d778-460e-ac87-ca38ef5e6755'. This means the key will be changed if cluster is deleted and recreated. As long as the old key and the new key is accessible to pg_tde, this won't cause any issues. pg_tde will handle it as it handles key rotation.

Validations

  • You can't set pg_tde.enabled=true without setting pg_tde.vault.
  • If you already had pg_tde.enabled, you can't remove pg_tde section completely.
  • If you already had pg_tde.enabled, you can't remove pg_tde.vault section completely.

CHECKLIST

Jira

  • Is the Jira ticket created and referenced properly?
  • Does the Jira ticket have the proper statuses for documentation (Needs Doc) and QA (Needs QA)?
  • Does the Jira ticket link to the proper milestone (Fix Version field)?

Tests

  • Is an E2E test/test case added for the new feature/change?
  • Are unit tests added where appropriate?

Config/Logging/Testability

  • Are all needed new/changed options added to default YAML files?
  • Are all needed new/changed options added to the Helm Chart?
  • Did we add proper logging messages for operator actions?
  • Did we ensure compatibility with the previous version or cluster upgrade process?
  • Does the change support oldest and newest supported PG version?
  • Does the change support oldest and newest supported Kubernetes version?

Copilot AI review requested due to automatic review settings February 12, 2026 13:58
@egegunes egegunes added this to the v2.9.0 milestone Feb 12, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds pg_tde (Transparent Data Encryption) support for PostgreSQL clusters, integrating with HashiCorp Vault for key management. The implementation follows the existing extension pattern in the codebase and includes API changes, reconciliation logic, and CRD updates.

Changes:

  • Added pg_tde extension support with Vault integration for managing encryption keys
  • Introduced new API types (PGTDESpec, PGTDEVaultSpec, PGTDESecretObjectReference) for configuring pg_tde
  • Refactored extension configuration to use individual extension structs instead of the deprecated builtin structure
  • Added reconciliation logic for configuring pg_tde providers and managing encryption keys

Reviewed changes

Copilot reviewed 20 out of 22 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
pkg/apis/postgres-operator.crunchydata.com/v1beta1/postgrescluster_types.go Added PGTDESpec, PGTDEVaultSpec, and PGTDESecretObjectReference types for pg_tde configuration
pkg/apis/postgres-operator.crunchydata.com/v1beta1/zz_generated.deepcopy.go Auto-generated DeepCopy methods for new pg_tde types
pkg/apis/pgv2.percona.com/v2/perconapgcluster_types.go Added BuiltInExtensionSpec and pg_tde support; refactored extension defaults handling
pkg/apis/pgv2.percona.com/v2/zz_generated.deepcopy.go Auto-generated DeepCopy methods for BuiltInExtensionSpec
internal/pgtde/postgres.go New package implementing pg_tde enable/disable and Vault provider configuration
internal/postgres/reconcile.go Added pg_tde volume and volume mount configuration for PostgreSQL instance pods
internal/controller/postgrescluster/postgres.go Added reconcilePGTDEProviders function to configure pg_tde Vault providers
internal/controller/postgrescluster/controller.go Integrated pg_tde parameters and provider reconciliation into main reconciliation flow
internal/controller/postgrescluster/instance.go Added TDEInstalledAnnotation to instance pods when pg_tde is enabled
internal/naming/names.go Added constants for pg_tde volume names, mount paths, and provider identifiers
internal/naming/annotations.go Added TDEInstalledAnnotation constant
internal/pgvector/postgres.go Fixed comments to correctly reference pgvector instead of pgAudit
percona/controller/pgbackup/controller.go Added pg.Default() call to ensure extension defaults before backup
deploy/.yaml and config/crd/bases/.yaml Updated CRD definitions with pg_tde specifications
deploy/cr.yaml Added example pg_tde configuration with Vault settings
pkg/apis/pgv2.percona.com/v2/perconapgbackup_types_test.go Replaced custom ptr() function with k8s.io/utils/ptr.To
Comments suppressed due to low confidence (1)

pkg/apis/pgv2.percona.com/v2/perconapgcluster_types.go:436

  • Inconsistent usage of extension enabled flags. Line 432 uses cr.Spec.Extensions.PGStatMonitor.Enabled (the new field structure), while lines 433-436 still use cr.Spec.Extensions.BuiltIn.* (the deprecated field structure). All lines should be updated to use the new field structure consistently for maintainability.
	postgresCluster.Spec.Extensions.PGStatMonitor = *cr.Spec.Extensions.PGStatMonitor.Enabled
	postgresCluster.Spec.Extensions.PGStatStatements = *cr.Spec.Extensions.BuiltIn.PGStatStatements
	postgresCluster.Spec.Extensions.PGAudit = *cr.Spec.Extensions.BuiltIn.PGAudit
	postgresCluster.Spec.Extensions.PGVector = *cr.Spec.Extensions.BuiltIn.PGVector
	postgresCluster.Spec.Extensions.PGRepack = *cr.Spec.Extensions.BuiltIn.PGRepack

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings February 16, 2026 13:05
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 37 out of 39 changed files in this pull request and generated 7 comments.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 37 out of 39 changed files in this pull request and generated 4 comments.

Copilot AI review requested due to automatic review settings February 18, 2026 09:19
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 38 out of 40 changed files in this pull request and generated 3 comments.

Copilot AI review requested due to automatic review settings February 19, 2026 18:09
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 43 out of 45 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

pkg/apis/pgv2.percona.com/v2/perconapgcluster_types.go:451

  • Inconsistent use of new vs deprecated extension fields when converting to Crunchy format. Line 447 uses the new field cr.Spec.Extensions.PGStatMonitor.Enabled, but lines 448-451 use the deprecated cr.Spec.Extensions.BuiltIn.* fields. While SetExtensionDefaults ensures both are populated for backwards compatibility, this inconsistency is error-prone and makes the code harder to maintain. Consider using the new field structure consistently (e.g., *cr.Spec.Extensions.PGStatStatements.Enabled instead of *cr.Spec.Extensions.BuiltIn.PGStatStatements).

@matserix
Copy link

This looks really interesting, definitely would help to reduce doing some workarounds.

As KMIP is the standard protocol which is also supported by Vault and most cloud providers integrate it into their secret manager, it should/must be supported besides Vault.

@egegunes
Copy link
Contributor Author

This looks really interesting, definitely would help to reduce doing some workarounds.

As KMIP is the standard protocol which is also supported by Vault and most cloud providers integrate it into their secret manager, it should/must be supported besides Vault.

@matserix we decided to just start with Vault. KMIP support will be added in future releases.

This commit adds native pg_tde extension support into operator.

**This commit only adds Vault KMS support for pg_tde. KMIP support will
be added in future releases.**

When pg_tde is enabled and Vault configuration is provided, the operator:
- appends pg_tde into shared_preload_libraries,
- mounts Vault token and CA secrets into database containers,
- runs CREATE EXTENSION in all databases,
- creates Vault provider by running pg_tde_add_global_key_provider_vault_v2,
- create a global key by running pg_tde_create_key_using_global_key_provider,
- sets the default key by running pg_tde_set_default_key_using_global_key_provider.

-> Example configuration

pg_tde:
  enabled: true
  vault:
    host: https://vault-service.vault-service.svc:8200
    mountPath: tde
    tokenSecret:
      name: vault-secret
      key: token
    caSecret:
      name: vault-secret
      key: ca.crt

Note that:
- Mount path needs to be a KV v2 storage engine.
- caSecret is optional and can be omitted if you want to use http. But
  in my testing I couldn't manage the make vault work without TLS. It
  responds with HTTP 405 if I disable TLS in vault.
- tokenSecret and caSecret can be the same secret or different. Operator
  doesn't assume anything about the contents of the secrets since you'll
  need to set secret keys in cr.yaml yourself.
- Using a non-root token requires more configuration. Check out pg_tde
  docs for that. But don't forget to add these in the Vault policy:

```
path "sys/internal/ui/mounts/*" {
      capabilities = ["read"]
}

path "sys/mounts/*" {
      capabilities = ["read"]
}
```

-> API changes

pg_tde requires more configuration options than other extensions
operator supports. This required us make some changes in the extensions
API. With these changes, 'spec.extensions.builtin' section is deprecated
and all builtin extensions are moved to 'spec.extensions.<extension>'
(i.e. 'spec.extensions.pg_stat_monitor'). Right now extensions can be
enabled/disabled with the old and the new method. If two methods are
used at the same time, 'spec.extensions.builtin' takes precedence.

-> Status changes

A hash will be calculated using pg_tde configuration provided by user.
Operator uses this hash to understand if config is changed and it should
reconfigure pg_tde. The hash can be found in status.pgTDERevision field
of **PostgresCluster** object. This hash will be removed when pg_tde is
disabled.

Operator also communicates the status of pg_tde with conditions. The
condition with type=PGTDEEnabled can be found in both PerconaPGCluster
and PostgresCluster statuses.

-> Disabling pg_tde

Disabling pg_tde is more complex than other extensions:
- First of all any encrypted objects must be dropped before disabling.
  Otherwise DROP EXTENSION will fail with a descriptive error message.
  **Operator won't drop anything, user needs to do this manually.**
- The extension needs to be disabled in two steps:
  1. First set pg_tde.enabled=false without removing the vault section.
     Operator will drop the extension and restart the pods.
  2. Then you can remove pg_tde.vault. Database pods will be restarted
     again to remove secret mounts from containers.
- It's recommended to run CHECKPOINT before removing pg_tde.vault. Even
  though extension is dropped, Postgres might still try to use encrypted
  objects during recovery after restart and it might try to access token
  secret. CHECKPOINT helps you prevent this failure case.

-> Deleting and recreating clusters

If cluster with pg_tde enabled is deleted but PVCs are retained, on
recreation you'll see some errors about pg_tde in operator logs. They
happen because the vault provider and/or global key already exists.
Operator will handle these errors gracefully and configure pg_tde. Same
thing applies when pg_tde is disabled and re-enabled. Since both vault
provider and global key already exists, operator will handle "already
exists" errors and configure pg_tde.

The global key name is determined by cluster's .metadata.uid. For
example 'global-master-key-ad19534a-d778-460e-ac87-ca38ef5e6755'. This
means the key will be changed if cluster is deleted and recreated. As
long as the old key and the new key is accessible to pg_tde, this won't
cause any issues. pg_tde will handle it as it handles key rotation.

-> Validations

- You can't set pg_tde.enabled=true without setting pg_tde.vault.
- If you already had pg_tde.enabled, you can't remove pg_tde section
  completely.
- If you already had pg_tde.enabled, you can't remove pg_tde.vault
  section completely.
@egegunes egegunes marked this pull request as ready for review February 20, 2026 08:48
Copilot AI review requested due to automatic review settings February 20, 2026 08:48
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 43 out of 45 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

pkg/apis/pgv2.percona.com/v2/perconapgcluster_types.go:451

  • The ToCrunchy method is inconsistent in how it accesses extension configuration. Line 447 uses the new API (cr.Spec.Extensions.PGStatMonitor.Enabled), but lines 448-451 still use the old deprecated BuiltIn API for PGStatStatements, PGAudit, PGVector, and PGRepack. This inconsistency means the new extension API won't work correctly for these extensions.

All extensions should use the new API consistently, similar to how PGStatMonitor is handled.

Comment on lines +479 to +483
revision, err := safeHash32(func(hasher io.Writer) error {
_, err := fmt.Fprint(hasher, vault.Host, vault.MountPath,
vault.TokenSecret.Key, vault.CASecret.Key)
return err
})
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pg_tde configuration hash (line 479-483) only includes vault.Host, vault.MountPath, vault.TokenSecret.Key, and vault.CASecret.Key, but does not include vault.TokenSecret.Name or vault.CASecret.Name. This means if a user changes only the secret names (pointing to different secrets with the same key names), the operator won't detect this as a configuration change and won't reconfigure pg_tde, potentially causing the extension to point to outdated or incorrect secrets.

Copilot uses AI. Check for mistakes.
Comment on lines +449 to +452
if !cluster.Spec.Extensions.PGTDE.Enabled || cluster.Spec.Extensions.PGTDE.Vault == nil {
cluster.Status.PGTDERevision = ""
return nil
}
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When pg_tde is disabled or vault config is removed, the PGTDERevision field is cleared (line 450) but patchStatus() is not called to persist this change to the cluster status. This means the revision clearing won't be saved, and on the next reconciliation loop, the operator might still see the old revision value and skip reconfiguration when pg_tde is re-enabled.

Copilot uses AI. Check for mistakes.
@JNKPercona
Copy link
Collaborator

Test Name Result Time
backup-enable-disable passed 00:06:58
builtin-extensions failure 00:13:41
custom-envs passed 00:18:45
custom-extensions failure 00:05:18
custom-tls passed 00:04:50
database-init-sql passed 00:03:15
demand-backup passed 00:23:14
demand-backup-offline-snapshot passed 00:13:17
dynamic-configuration passed 00:03:23
finalizers passed 00:03:48
init-deploy passed 00:02:56
huge-pages passed 00:02:59
monitoring failure 00:15:09
monitoring-pmm3 failure 00:25:50
one-pod passed 00:05:44
operator-self-healing passed 00:09:45
pg-tde passed 00:09:02
pitr passed 00:11:47
scaling passed 00:05:23
scheduled-backup passed 00:25:45
self-healing passed 00:08:00
sidecars passed 00:02:45
standby-pgbackrest passed 00:11:47
standby-streaming passed 00:09:17
start-from-backup passed 00:10:28
tablespaces passed 00:06:47
telemetry-transfer passed 00:04:25
upgrade-consistency passed 00:05:37
upgrade-minor failure 00:15:49
users passed 00:04:23
Summary Value
Tests Run 30/30
Job Duration 02:08:32
Total Test Time 04:50:23

commit: db13556
image: perconalab/percona-postgresql-operator:PR-1440-db135567a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants