Skip to content

Conversation

@divyenpatel
Copy link
Member

@divyenpatel divyenpatel commented Dec 6, 2025

What this PR does / why we need it:

This commit adds handling for CnsNotRegisteredFault in various CNS volume
operations for the WORKLOAD cluster flavor. When a volume operation fails
with CnsNotRegisteredFault, the driver now attempts to re-register the
volume with CNS and retries the operation.

Changes include:

  • Add clusterId and clusterDistribution parameters to GetManager() for
    volume re-registration
  • Add ReRegisterVolume() public method to re-register unregistered volumes
  • Add IsCnsNotRegisteredFault() helper to detect the fault type
  • Add IsCnsVolumeAlreadyExistsFault() helper for idempotent re-registration
  • Handle CnsNotRegisteredFault in:
    • AttachVolume
    • DetachVolume
    • DeleteVolume (with improved idempotency)
    • UpdateVolumeMetadata
    • UpdateVolumeCrypto
    • ExpandVolume (with improved idempotency)
    • CreateSnapshot (with improved idempotency and with transaction)
    • DeleteSnapshot
    • RelocateVolume (in migrationController)

The re-registration is only attempted once per operation to prevent
infinite loops. If re-registration fails or the retry fails, the
original error is returned.

Testing done:

Refer to tests and logs here - https://gist.github.com/divyenpatel/c48fddcc2bb7323fbbd168d4e035ff6a

pre-checkin runs:

Special notes for your reviewer:

Release note:

handle CnsNotRegisteredFault and re-register volume

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 6, 2025
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 6, 2025
@divyenpatel divyenpatel force-pushed the handle-CnsNotRegisteredFault branch 4 times, most recently from bd3e2f8 to 8469478 Compare December 11, 2025 00:06
@divyenpatel divyenpatel changed the title [WIP] handle CnsNotRegisteredFault and re-register volume handle CnsNotRegisteredFault, re-register volume and continue volume operations Dec 11, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 11, 2025
@divyenpatel divyenpatel force-pushed the handle-CnsNotRegisteredFault branch from 8469478 to 49fc6ee Compare December 14, 2025 19:26
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 14, 2025
@divyenpatel divyenpatel force-pushed the handle-CnsNotRegisteredFault branch from 49fc6ee to a2961cf Compare December 14, 2025 19:33
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 14, 2025
@divyenpatel divyenpatel force-pushed the handle-CnsNotRegisteredFault branch from a2961cf to 2206fdb Compare December 19, 2025 06:59
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 19, 2025
@divyenpatel divyenpatel force-pushed the handle-CnsNotRegisteredFault branch from 2206fdb to 7372aba Compare December 19, 2025 07:33
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 19, 2025
@divyenpatel divyenpatel force-pushed the handle-CnsNotRegisteredFault branch 2 times, most recently from 28411f8 to cc3ea00 Compare December 19, 2025 07:46
@deepakkinni
Copy link
Collaborator

Triggering CSI-WCP Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #774

@deepakkinni
Copy link
Collaborator

FAILED --- Jenkins Build #774

Copy link
Contributor

@skogta skogta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have some minor comments. Overall looks good to me.

@deepakkinni
Copy link
Collaborator

Triggering CSI-WCP Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #784

@divyenpatel divyenpatel force-pushed the handle-CnsNotRegisteredFault branch from cc3ea00 to 394b218 Compare December 19, 2025 18:39
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 19, 2025
@divyenpatel divyenpatel force-pushed the handle-CnsNotRegisteredFault branch 2 times, most recently from 926700c to 90264e2 Compare December 19, 2025 19:17
@deepakkinni
Copy link
Collaborator

FAILED --- Jenkins Build #785

@deepakkinni
Copy link
Collaborator

FAILED --- Jenkins Build #309

@deepakkinni
Copy link
Collaborator

FAILED --- Jenkins Build #795

@deepakkinni
Copy link
Collaborator

FAILED --- Jenkins Build #735

@deepakkinni
Copy link
Collaborator

Triggering CSI-TKG Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #744

@deepakkinni
Copy link
Collaborator

Triggering CSI-TKG Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #746

@deepakkinni
Copy link
Collaborator

FAILED --- Jenkins Build #746

@deepakkinni
Copy link
Collaborator

FAILED --- Jenkins Build #747

@deepakkinni
Copy link
Collaborator

Triggering CSI-TKG Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #751

@deepakkinni
Copy link
Collaborator

Triggering CSI-TKG Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #753

@deepakkinni
Copy link
Collaborator

FAILED --- Jenkins Build #753

This commit adds handling for CnsNotRegisteredFault in various CNS volume
operations for the WORKLOAD cluster flavor. When a volume operation fails
with CnsNotRegisteredFault, the driver now attempts to re-register the
volume with CNS and retries the operation.

Changes include:
- Add clusterId and clusterDistribution parameters to GetManager() for
  volume re-registration
- Add ReRegisterVolume() public method to re-register unregistered volumes
- Add IsCnsNotRegisteredFault() helper to detect the fault type
- Add IsCnsVolumeAlreadyExistsFault() helper for idempotent re-registration
- Handle CnsNotRegisteredFault in:
  - AttachVolume
  - DetachVolume
  - DeleteVolume (with improved idempotency)
  - UpdateVolumeMetadata
  - UpdateVolumeCrypto
  - ExpandVolume (with improved idempotency)
  - CreateSnapshot (with improved idempotency and with transaction)
  - DeleteSnapshot
  - RelocateVolume (in migrationController)

The re-registration is only attempted once per operation to prevent
infinite loops. If re-registration fails or the retry fails, the
original error is returned.
@divyenpatel divyenpatel force-pushed the handle-CnsNotRegisteredFault branch from 90264e2 to 7a6be88 Compare January 8, 2026 23:31
@deepakkinni
Copy link
Collaborator

Triggering CSI-WCP Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #856

@deepakkinni
Copy link
Collaborator

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 9, 2026
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deepakkinni, divyenpatel, skogta

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [deepakkinni,divyenpatel]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 4ca9e99 into kubernetes-sigs:master Jan 9, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants