Skip to content

fix: prevent refreshAliasesTask from reverting rotated aliases in aws_kms plugin#6805

Open
angabini wants to merge 4 commits intospiffe:mainfrom
angabini:fix-aws-kms-refresh-alias-race
Open

fix: prevent refreshAliasesTask from reverting rotated aliases in aws_kms plugin#6805
angabini wants to merge 4 commits intospiffe:mainfrom
angabini:fix-aws-kms-refresh-alias-race

Conversation

@angabini
Copy link
Copy Markdown

Pull Request check list

  • Commit conforms to CONTRIBUTING.md?
  • Proper tests/regressions included?
  • Documentation updated?

Affected functionality

aws_kms KeyManager plugin — specifically the refreshAliasesTask background task that periodically refreshes alias timestamps.

Description of change

In multi-replica SPIRE server deployments sharing a key_identifier_value, refreshAliasesTask on non-leader replicas can revert a KMS alias to a stale cached key after the leader rotates it. If the old key is subsequently scheduled for deletion, the alias ends up pointing to a disabled key, causing all future server startups to crash with FailedPrecondition.

This PR adds a DescribeKey check in refreshAliases() before calling UpdateAlias. If the alias's current target differs from the local cache (indicating a rotation by another replica), the refresh is skipped to avoid reverting the rotation.

Which issue this PR fixes

fixes #6804

…_kms plugin

Signed-off-by: angabini <494150+angabini@users.noreply.github.com>
@amartinezfayo amartinezfayo self-assigned this Mar 31, 2026
Copy link
Copy Markdown
Member

@amartinezfayo amartinezfayo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @angabini for this contribution!

}
if describeResp == nil || describeResp.KeyMetadata == nil || describeResp.KeyMetadata.Arn == nil {
p.log.Error("Malformed describe key response during alias refresh", aliasNameTag, entry.AliasName)
continue
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this path should also append to errs, similar to the DescribeKey error path.

Copy link
Copy Markdown
Author

@angabini angabini Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. does it look ok?

require.Equal(t, "key_id_rotated", *aliasAfter.KeyEntry.KeyID, "alias should not have been reverted to the stale cached key")
}

func TestRefreshAliasesDescribeKeyError(t *testing.T) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could also add a test for the malformed response branch (where DescribeKey succeeds but returns nil KeyMetadata or nil Arn)?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

angabini added a commit to angabini/spire that referenced this pull request Apr 22, 2026
Address review feedback on spiffe#6805: the malformed-response branch in
refreshAliases now appends to errs so the periodic task surfaces the
problem, matching the DescribeKey error path. Adds a unit test that
covers the new branch using a new fake-client knob.

Signed-off-by: angabini <494150+angabini@users.noreply.github.com>
Made-with: Cursor
Address review feedback on spiffe#6805: the malformed-response branch in
refreshAliases now appends to errs so the periodic task surfaces the
problem, matching the DescribeKey error path. Adds a unit test that
covers the new branch using a new fake-client knob.

Signed-off-by: angabini <494150+angabini@users.noreply.github.com>
@angabini angabini force-pushed the fix-aws-kms-refresh-alias-race branch from 6f1991d to 2916541 Compare April 22, 2026 22:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

aws_kms: refreshAliasesTask can revert rotated aliases in multi-replica deployments

2 participants