Skip to content

feat: pin database image to prevent silent changes on CP upgrade#403

Merged
tsivaprasad merged 1 commit into
mainfrom
PLAT-600-pin-database-image-to-prevent-silent-changes-on-cp-upgrade
Jul 1, 2026
Merged

feat: pin database image to prevent silent changes on CP upgrade#403
tsivaprasad merged 1 commit into
mainfrom
PLAT-600-pin-database-image-to-prevent-silent-changes-on-cp-upgrade

Conversation

@tsivaprasad

@tsivaprasad tsivaprasad commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR pins each database instance's container image at creation time by persisting it in ResolvedImage within etcd. Subsequent reconciles use the stored ResolvedImage directly rather than re-resolving the image from the manifest, ensuring that Control Plane upgrades cannot inadvertently change the image of an existing running database.

Changes

  • Updated ReconcileInstanceSpec in orchestrator/swarm/orchestrator.go to preserve the existing ResolvedImage when PgEdgeVersion remains unchanged. This allows resolveInstanceImages() to take the fast path and avoid manifest lookups during normal reconciles. When the version changes, the stale image pin is cleared and re-resolved from the manifest.
  • Added resolveServiceImage() and ReconcileServiceInstanceSpec() in orchestrator/swarm/orchestrator.go, extending image pinning behavior to MCP, RAG, and PostgREST service instances using the same pattern as PostgreSQL instances.
  • Added ReconcileServiceInstanceSpec() to the Orchestrator interface in database/orchestrator.go.
  • Updated database/service.go to invoke s.orchestrator.ReconcileServiceInstanceSpec() before persisting service instance specifications.
  • Updated database/reconcile_versions.go so that when the instance monitor detects a version change and updates PgEdgeVersion, it also clears ResolvedImage. This ensures the next reconcile derives the correct image for the new version and prevents no-op updates from reverting externally upgraded instances back to an older image.
  • Added a no-op implementation of ReconcileServiceInstanceSpec() in orchestrator/systemd/orchestrator.go to maintain interface compatibility.

Testing

Verification:

Test Scenarios

1. No Image Override

  • Create a database without specifying an image override
    create_db_with_no_image.json
    .
  • Verify the manifest image (17.9-spock5.0.6-standard-2) is selected and stored in ResolvedImage.
  • Perform a no-op update and confirm the image remains unchanged.

2. User Image Override

  • Create a database with a custom image (for example, my-custom-image)
    create_db.json
    .
  • Verify the custom image is deployed directly.
  • Confirm validation warnings are returned.
  • Verify ResolvedImage is not persisted.

3. External Upgrade Followed by No-Op Update

  • Perform an external image upgrade (for example, from 17.9 to 17.10).
docker service update \
  --image ghcr.io/pgedge/pgedge-postgres:17.10-spock5.0.8-standard-1 \
  --no-healthcheck \
  postgres-storefront-no-image-n1-689qacsi

postgres-storefront-no-image-n1-689qacsi
overall progress: 1 out of 1 tasks 
1/1: running   [==================================================>] 
verify: Service postgres-storefront-no-image-n1-689qacsi converged 

docker service ls      
ID             NAME                                       MODE         REPLICAS   IMAGE                                                        PORTS
znrg8hhoo28c   postgres-storefront-no-image-n1-9ptayhma   replicated   1/1        ghcr.io/pgedge/pgedge-postgres:17.9-spock5.0.6-standard-2    
m7jks7yktzta   postgres-storefront-no-image-n1-689qacsi   replicated   1/1        ghcr.io/pgedge/pgedge-postgres:17.10-spock5.0.8-standard-1   
l7i7y5saauzs   postgres-storefront-no-image-n1-ant97dj4   replicated   1/1        ghcr.io/pgedge/pgedge-postgres:17.9-spock5.0.6-standard-2 
  • Trigger a no-op update through the Control Plane.
  • Verify the instance remains on 17.10 and is not reverted to the previous image.
docker service ls --filter label=pgedge.database.id=storefront-no-image \
  --format '{{.Name}} {{.Image}}'

postgres-storefront-no-image-n1-9ptayhma ghcr.io/pgedge/pgedge-postgres:17.9-spock5.0.6-standard-2
postgres-storefront-no-image-n1-689qacsi ghcr.io/pgedge/pgedge-postgres:17.10-spock5.0.8-standard-1
postgres-storefront-no-image-n1-ant97dj4 ghcr.io/pgedge/pgedge-postgres:17.9-spock5.0.6-standard-2```

## Checklist
- [x] Tests added 

[PLAT-600](https://pgedge.atlassian.net/browse/PLAT-600)

[PLAT-600]: https://pgedge.atlassian.net/browse/PLAT-600?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e40ab498-f889-4655-a6b8-698cc52bec19

📥 Commits

Reviewing files that changed from the base of the PR and between 82c5ac8 and d1fd436.

📒 Files selected for processing (7)
  • server/internal/database/orchestrator.go
  • server/internal/database/reconcile_versions.go
  • server/internal/database/service.go
  • server/internal/orchestrator/swarm/orchestrator.go
  • server/internal/orchestrator/swarm/reconcile_instance_spec_test.go
  • server/internal/orchestrator/swarm/resolve_service_image_test.go
  • server/internal/orchestrator/systemd/orchestrator.go
🚧 Files skipped from review as they are similar to previous changes (7)
  • server/internal/database/reconcile_versions.go
  • server/internal/database/orchestrator.go
  • server/internal/orchestrator/swarm/resolve_service_image_test.go
  • server/internal/orchestrator/systemd/orchestrator.go
  • server/internal/orchestrator/swarm/reconcile_instance_spec_test.go
  • server/internal/database/service.go
  • server/internal/orchestrator/swarm/orchestrator.go

📝 Walkthrough

Walkthrough

This PR adds a service-instance reconciliation hook, updates Swarm image pin handling for instance and service specs, routes service image lookup through a shared resolver, and adds tests for the new reconciliation and resolution behavior.

Changes

Service Instance Spec Reconciliation with Image Pinning

Layer / File(s) Summary
Interface Contract and Database Service Integration
server/internal/database/orchestrator.go, server/internal/database/service.go, server/internal/database/reconcile_versions.go
Adds ReconcileServiceInstanceSpec(old, new *ServiceInstanceSpec) error to the orchestrator interface, calls it during service instance reconciliation, and clears Swarm.ResolvedImage when database instance versions change.
Instance Spec Reconciliation (Swarm) — Implementation and Tests
server/internal/orchestrator/swarm/orchestrator.go, server/internal/orchestrator/swarm/reconcile_instance_spec_test.go
Updates instance reconciliation to preserve or clear Swarm.ResolvedImage based on version changes, with tests covering creation, carry-forward, backfill, refresh, and override cases.
Service Image Resolution and Reconciliation (Swarm) — Implementation and Tests
server/internal/orchestrator/swarm/orchestrator.go, server/internal/orchestrator/swarm/resolve_service_image_test.go
Adds service image resolution precedence and service-spec reconciliation behavior, updates MCP/RAG resource generation to use the shared resolver, and tests the new resolution and reconciliation paths.
Systemd Orchestrator Stub Implementation
server/internal/orchestrator/systemd/orchestrator.go
Adds a no-op ReconcileServiceInstanceSpec implementation for the systemd orchestrator.

Poem

🐰 I hopped through pins and manifests bright,
Cached images now stay snug and tight.
When versions shift, new tags appear,
One neat reconcile keeps things clear.
A rabbit cheers with whiskers perked:
“The image path now just works!”

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly matches the main change: pinning database images to prevent silent upgrades during control-plane updates.
Description check ✅ Passed The description covers the required Summary, Changes, Testing, and Checklist sections, with only some optional checklist items left unfilled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch PLAT-600-pin-database-image-to-prevent-silent-changes-on-cp-upgrade

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@codacy-production

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 2 medium

Results:
2 new issues

Category Results
Complexity 2 medium

View in Codacy

🟢 Metrics 28 complexity · 0 duplication

Metric Results
Complexity 28
Duplication 0

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@tsivaprasad

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@jason-lynch

Copy link
Copy Markdown
Member

@tsivaprasad Could you please remind me real quick about how someone will upgrade when we bump the image version? For example, say someone is running Spock 5.0.9 and Postgres 17.10, and we release a new image with Spock 5.0.10 and Postgres 17.10. Both of those images have the same pgEdge version, so how did we decide to enable that person to upgrade?

@tsivaprasad tsivaprasad force-pushed the PLAT-599-allow-custom-image-override-per-database-or-node branch from cb8ce47 to 7c3f46b Compare June 30, 2026 18:10
Base automatically changed from PLAT-599-allow-custom-image-override-per-database-or-node to main July 1, 2026 03:03
@tsivaprasad tsivaprasad force-pushed the PLAT-600-pin-database-image-to-prevent-silent-changes-on-cp-upgrade branch from 82c5ac8 to d1fd436 Compare July 1, 2026 05:57
@tsivaprasad

tsivaprasad commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

@tsivaprasad Could you please remind me real quick about how someone will upgrade when we bump the image version? For example, say someone is running Spock 5.0.9 and Postgres 17.10, and we release a new image with Spock 5.0.10 and Postgres 17.10. Both of those images have the same pgEdge version, so how did we decide to enable that person to upgrade?

Good question. Right now, if only the image changes (for example, spock5.0.9 → spock5.0.10 while staying on PostgreSQL 17.10), the image pin is not cleared automatically.

The expected way to handle this is through the minor version upgrade flow (PLAT-603). That flow updates the spec version, calls ApplyUpgrade, and triggers ReconcileInstanceSpec, which clears and re-resolves ResolvedImage.

For image-only updates where the PostgreSQL and Spock versions stay the same, we'll need a separate solution, such as a "repull latest image for this version" API or a way to force-clear ResolvedImage. We can track that as a follow-up in PLAT-641

@tsivaprasad tsivaprasad requested a review from jason-lynch July 1, 2026 07:13
@tsivaprasad tsivaprasad merged commit e6cf817 into main Jul 1, 2026
3 checks passed
@tsivaprasad tsivaprasad deleted the PLAT-600-pin-database-image-to-prevent-silent-changes-on-cp-upgrade branch July 1, 2026 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants