Conversation
There was a problem hiding this comment.
Pull request overview
This pull request adds pg_tde (Transparent Data Encryption) support for PostgreSQL clusters, integrating with HashiCorp Vault for key management. The implementation follows the existing extension pattern in the codebase and includes API changes, reconciliation logic, and CRD updates.
Changes:
- Added pg_tde extension support with Vault integration for managing encryption keys
- Introduced new API types (PGTDESpec, PGTDEVaultSpec, PGTDESecretObjectReference) for configuring pg_tde
- Refactored extension configuration to use individual extension structs instead of the deprecated builtin structure
- Added reconciliation logic for configuring pg_tde providers and managing encryption keys
Reviewed changes
Copilot reviewed 20 out of 22 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/apis/postgres-operator.crunchydata.com/v1beta1/postgrescluster_types.go | Added PGTDESpec, PGTDEVaultSpec, and PGTDESecretObjectReference types for pg_tde configuration |
| pkg/apis/postgres-operator.crunchydata.com/v1beta1/zz_generated.deepcopy.go | Auto-generated DeepCopy methods for new pg_tde types |
| pkg/apis/pgv2.percona.com/v2/perconapgcluster_types.go | Added BuiltInExtensionSpec and pg_tde support; refactored extension defaults handling |
| pkg/apis/pgv2.percona.com/v2/zz_generated.deepcopy.go | Auto-generated DeepCopy methods for BuiltInExtensionSpec |
| internal/pgtde/postgres.go | New package implementing pg_tde enable/disable and Vault provider configuration |
| internal/postgres/reconcile.go | Added pg_tde volume and volume mount configuration for PostgreSQL instance pods |
| internal/controller/postgrescluster/postgres.go | Added reconcilePGTDEProviders function to configure pg_tde Vault providers |
| internal/controller/postgrescluster/controller.go | Integrated pg_tde parameters and provider reconciliation into main reconciliation flow |
| internal/controller/postgrescluster/instance.go | Added TDEInstalledAnnotation to instance pods when pg_tde is enabled |
| internal/naming/names.go | Added constants for pg_tde volume names, mount paths, and provider identifiers |
| internal/naming/annotations.go | Added TDEInstalledAnnotation constant |
| internal/pgvector/postgres.go | Fixed comments to correctly reference pgvector instead of pgAudit |
| percona/controller/pgbackup/controller.go | Added pg.Default() call to ensure extension defaults before backup |
| deploy/.yaml and config/crd/bases/.yaml | Updated CRD definitions with pg_tde specifications |
| deploy/cr.yaml | Added example pg_tde configuration with Vault settings |
| pkg/apis/pgv2.percona.com/v2/perconapgbackup_types_test.go | Replaced custom ptr() function with k8s.io/utils/ptr.To |
Comments suppressed due to low confidence (1)
pkg/apis/pgv2.percona.com/v2/perconapgcluster_types.go:436
- Inconsistent usage of extension enabled flags. Line 432 uses cr.Spec.Extensions.PGStatMonitor.Enabled (the new field structure), while lines 433-436 still use cr.Spec.Extensions.BuiltIn.* (the deprecated field structure). All lines should be updated to use the new field structure consistently for maintainability.
postgresCluster.Spec.Extensions.PGStatMonitor = *cr.Spec.Extensions.PGStatMonitor.Enabled
postgresCluster.Spec.Extensions.PGStatStatements = *cr.Spec.Extensions.BuiltIn.PGStatStatements
postgresCluster.Spec.Extensions.PGAudit = *cr.Spec.Extensions.BuiltIn.PGAudit
postgresCluster.Spec.Extensions.PGVector = *cr.Spec.Extensions.BuiltIn.PGVector
postgresCluster.Spec.Extensions.PGRepack = *cr.Spec.Extensions.BuiltIn.PGRepack
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
pkg/apis/postgres-operator.crunchydata.com/v1beta1/postgrescluster_types.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 43 out of 45 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
pkg/apis/pgv2.percona.com/v2/perconapgcluster_types.go:451
- Inconsistent use of new vs deprecated extension fields when converting to Crunchy format. Line 447 uses the new field
cr.Spec.Extensions.PGStatMonitor.Enabled, but lines 448-451 use the deprecatedcr.Spec.Extensions.BuiltIn.*fields. While SetExtensionDefaults ensures both are populated for backwards compatibility, this inconsistency is error-prone and makes the code harder to maintain. Consider using the new field structure consistently (e.g.,*cr.Spec.Extensions.PGStatStatements.Enabledinstead of*cr.Spec.Extensions.BuiltIn.PGStatStatements).
|
This looks really interesting, definitely would help to reduce doing some workarounds. As KMIP is the standard protocol which is also supported by Vault and most cloud providers integrate it into their secret manager, it should/must be supported besides Vault. |
@matserix we decided to just start with Vault. KMIP support will be added in future releases. |
This commit adds native pg_tde extension support into operator.
**This commit only adds Vault KMS support for pg_tde. KMIP support will
be added in future releases.**
When pg_tde is enabled and Vault configuration is provided, the operator:
- appends pg_tde into shared_preload_libraries,
- mounts Vault token and CA secrets into database containers,
- runs CREATE EXTENSION in all databases,
- creates Vault provider by running pg_tde_add_global_key_provider_vault_v2,
- create a global key by running pg_tde_create_key_using_global_key_provider,
- sets the default key by running pg_tde_set_default_key_using_global_key_provider.
-> Example configuration
pg_tde:
enabled: true
vault:
host: https://vault-service.vault-service.svc:8200
mountPath: tde
tokenSecret:
name: vault-secret
key: token
caSecret:
name: vault-secret
key: ca.crt
Note that:
- Mount path needs to be a KV v2 storage engine.
- caSecret is optional and can be omitted if you want to use http. But
in my testing I couldn't manage the make vault work without TLS. It
responds with HTTP 405 if I disable TLS in vault.
- tokenSecret and caSecret can be the same secret or different. Operator
doesn't assume anything about the contents of the secrets since you'll
need to set secret keys in cr.yaml yourself.
- Using a non-root token requires more configuration. Check out pg_tde
docs for that. But don't forget to add these in the Vault policy:
```
path "sys/internal/ui/mounts/*" {
capabilities = ["read"]
}
path "sys/mounts/*" {
capabilities = ["read"]
}
```
-> API changes
pg_tde requires more configuration options than other extensions
operator supports. This required us make some changes in the extensions
API. With these changes, 'spec.extensions.builtin' section is deprecated
and all builtin extensions are moved to 'spec.extensions.<extension>'
(i.e. 'spec.extensions.pg_stat_monitor'). Right now extensions can be
enabled/disabled with the old and the new method. If two methods are
used at the same time, 'spec.extensions.builtin' takes precedence.
-> Status changes
A hash will be calculated using pg_tde configuration provided by user.
Operator uses this hash to understand if config is changed and it should
reconfigure pg_tde. The hash can be found in status.pgTDERevision field
of **PostgresCluster** object. This hash will be removed when pg_tde is
disabled.
Operator also communicates the status of pg_tde with conditions. The
condition with type=PGTDEEnabled can be found in both PerconaPGCluster
and PostgresCluster statuses.
-> Disabling pg_tde
Disabling pg_tde is more complex than other extensions:
- First of all any encrypted objects must be dropped before disabling.
Otherwise DROP EXTENSION will fail with a descriptive error message.
**Operator won't drop anything, user needs to do this manually.**
- The extension needs to be disabled in two steps:
1. First set pg_tde.enabled=false without removing the vault section.
Operator will drop the extension and restart the pods.
2. Then you can remove pg_tde.vault. Database pods will be restarted
again to remove secret mounts from containers.
- It's recommended to run CHECKPOINT before removing pg_tde.vault. Even
though extension is dropped, Postgres might still try to use encrypted
objects during recovery after restart and it might try to access token
secret. CHECKPOINT helps you prevent this failure case.
-> Deleting and recreating clusters
If cluster with pg_tde enabled is deleted but PVCs are retained, on
recreation you'll see some errors about pg_tde in operator logs. They
happen because the vault provider and/or global key already exists.
Operator will handle these errors gracefully and configure pg_tde. Same
thing applies when pg_tde is disabled and re-enabled. Since both vault
provider and global key already exists, operator will handle "already
exists" errors and configure pg_tde.
The global key name is determined by cluster's .metadata.uid. For
example 'global-master-key-ad19534a-d778-460e-ac87-ca38ef5e6755'. This
means the key will be changed if cluster is deleted and recreated. As
long as the old key and the new key is accessible to pg_tde, this won't
cause any issues. pg_tde will handle it as it handles key rotation.
-> Validations
- You can't set pg_tde.enabled=true without setting pg_tde.vault.
- If you already had pg_tde.enabled, you can't remove pg_tde section
completely.
- If you already had pg_tde.enabled, you can't remove pg_tde.vault
section completely.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 43 out of 45 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
pkg/apis/pgv2.percona.com/v2/perconapgcluster_types.go:451
- The ToCrunchy method is inconsistent in how it accesses extension configuration. Line 447 uses the new API (cr.Spec.Extensions.PGStatMonitor.Enabled), but lines 448-451 still use the old deprecated BuiltIn API for PGStatStatements, PGAudit, PGVector, and PGRepack. This inconsistency means the new extension API won't work correctly for these extensions.
All extensions should use the new API consistently, similar to how PGStatMonitor is handled.
| revision, err := safeHash32(func(hasher io.Writer) error { | ||
| _, err := fmt.Fprint(hasher, vault.Host, vault.MountPath, | ||
| vault.TokenSecret.Key, vault.CASecret.Key) | ||
| return err | ||
| }) |
There was a problem hiding this comment.
The pg_tde configuration hash (line 479-483) only includes vault.Host, vault.MountPath, vault.TokenSecret.Key, and vault.CASecret.Key, but does not include vault.TokenSecret.Name or vault.CASecret.Name. This means if a user changes only the secret names (pointing to different secrets with the same key names), the operator won't detect this as a configuration change and won't reconfigure pg_tde, potentially causing the extension to point to outdated or incorrect secrets.
| if !cluster.Spec.Extensions.PGTDE.Enabled || cluster.Spec.Extensions.PGTDE.Vault == nil { | ||
| cluster.Status.PGTDERevision = "" | ||
| return nil | ||
| } |
There was a problem hiding this comment.
When pg_tde is disabled or vault config is removed, the PGTDERevision field is cleared (line 450) but patchStatus() is not called to persist this change to the cluster status. This means the revision clearing won't be saved, and on the next reconciliation loop, the operator might still see the old revision value and skip reconfiguration when pg_tde is re-enabled.
commit: db13556 |
CHANGE DESCRIPTION
This commit adds native pg_tde extension support into operator.
This commit only adds Vault KMS support for pg_tde. KMIP support will be added in future releases.
When pg_tde is enabled and Vault configuration is provided, the operator:
Example configuration
Note that:
API changes
pg_tde requires more configuration options than other extensions operator supports. This required us make some changes in the extensions API. With these changes, 'spec.extensions.builtin' section is deprecated and all builtin extensions are moved to 'spec.extensions.' (i.e. 'spec.extensions.pg_stat_monitor'). Right now extensions can be enabled/disabled with the old and the new method. If two methods are used at the same time, 'spec.extensions.builtin' takes precedence.
Status changes
A hash will be calculated using pg_tde configuration provided by user. Operator uses this hash to understand if config is changed and it should reconfigure pg_tde. The hash can be found in status.pgTDERevision field of PostgresCluster object. This hash will be removed when pg_tde is disabled.
Operator also communicates the status of pg_tde with conditions. The condition with type=PGTDEEnabled can be found in both PerconaPGCluster and PostgresCluster statuses.
Disabling pg_tde
Disabling pg_tde is more complex than other extensions:
Deleting and recreating clusters
If cluster with pg_tde enabled is deleted but PVCs are retained, on recreation you'll see some errors about pg_tde in operator logs. They happen because the vault provider and/or global key already exists. Operator will handle these errors gracefully and configure pg_tde. Same thing applies when pg_tde is disabled and re-enabled. Since both vault provider and global key already exists, operator will handle "already exists" errors and configure pg_tde.
The global key name is determined by cluster's .metadata.uid. For example 'global-master-key-ad19534a-d778-460e-ac87-ca38ef5e6755'. This means the key will be changed if cluster is deleted and recreated. As long as the old key and the new key is accessible to pg_tde, this won't cause any issues. pg_tde will handle it as it handles key rotation.
Validations
CHECKLIST
Jira
Needs Doc) and QA (Needs QA)?Tests
Config/Logging/Testability