fix: needs_recreate behavior#424
Conversation
Resources can sometimes fail partway through the creation process, leaving files or other traces behind on the host. An example of this scenario is when we're able to create a Swarm service, but it fails to start. Rather than rollback the create with a delete, which could also be broken or dangerous, we record the resource into the state with the `needs_recreate` flag set to `true`. This way, if the user deletes the database or makes another update that would delete the resource, we know to run that resource's `Delete` method to ensure that it's removed from the host. Conversely, we know that we should run this resource's `Create` method again rather than its `Update` method if the user re-attempts the operation. In this case, the intent was to call `Create` in whichever phase it would normally be recreated if it didn't exist. However, since our planning process only looks at differences between states, it sees resources that need to be recreated in the initial state, even if there aren't any diffs. This lead to problems when the resource's normal phase was later in the update plan, such as when you fix a misconfigured supporting service resource. This commit modifies an existing process that handled pending deletions so that it removes resources that are pending recreation from the start state. This enables us to plan their creation at the appropriate phase. I've included a golden test that demonstrates this behavior. I've also modified some existing PostgREST E2Es to test this behavior because because that's where we first noticed this issue. PLAT-668
📝 WalkthroughWalkthroughChanges State's ChangesRecreate handling and failed deployment recovery
PostgREST e2e recovery tests
Compact metadata: 5 files changed, +125/-4 lines across e2e tests, database operations, and resource state. Suggested labels: go, tests, bug-fix Suggested reviewers: pgEdge control-plane maintainers familiar with database operations and resource state reconciliation Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | 0 |
| Duplication | 0 |
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
Summary
This commit modifies an existing process that handled pending deletions to remove resources that are pending recreation from the start state. This enables us to plan their creation at the appropriate phase.
I've included a golden test that demonstrates this behavior. I've also modified some existing PostgREST E2Es to test this behavior, since that's where we first noticed the issue.
Background
Resources can sometimes fail partway through the creation process, leaving files or other traces behind on the host. An example of this scenario is when we can create a Swarm service, but it fails to start. Rather than roll back the creation with a delete, which could also be broken or dangerous, we record the resource into the state with the
needs_recreateflag set totrue. This way, if the user deletes the database or makes another update that would delete the resource, we know to run that resource'sDeletemethod to ensure that it's removed from the host.Conversely, we know that we should run this resource's
Createmethod again, rather than itsUpdatemethod, if the user reattempts the operation. In this case, the intent was to callCreatein whichever phase it would normally be recreated if it didn't exist. However, since our planning process only looks at differences between states, it sees resources that need to be recreated in the initial state, even if there aren't any diffs. This led to problems when the resource's normal phase was later in the update plan, such as when you fix a misconfigured supporting service resource.Changes
The core change is in
state.go: modifying the existingMarkPendingDeletionto also handle pending recreations. Most of the rest of this PR is test code.Testing
PLAT-668