fix: implement OnEventStatusHandler.GetOperationStatus for DataLoad and DataMigrate#5945
Conversation
…nd DataMigrate OnEventStatusHandler.GetOperationStatus was a stub returning nil, nil in both dataload and datamigrate packages. Since OnEvent is a valid selectable policy, this caused silent status loss at runtime - the operation status was overwritten with nil with no error surfaced. Implement the handler following the same pattern as OnceStatusHandler: look up the triggered job, check its finished condition, and return the correct phase, conditions and duration. Also replace the stub tests with real coverage for complete, failed and still-running job scenarios. Fixes fluid-cloudnative#5944 Signed-off-by: Aditya Upasani <adityaupasani29@gmail.com>
|
Hi @adityaupasani2. Thanks for your PR. I'm waiting for a fluid-cloudnative member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Code Review
This pull request implements the GetOperationStatus method for OnEventStatusHandler in both the dataload and datamigrate controllers, replacing the previous TODO stubs. It also adds comprehensive unit tests to verify the status handling logic under different job conditions. The review feedback highlights a missing return statement in the dataload status handler when a missing job's helm release is successfully deleted, which could lead to logging a nil error and incorrect execution flow. Additionally, it is recommended to use ctx.Log instead of o.Log in the datamigrate status handler for consistency and proper context propagation.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| if err != nil { | ||
| if utils.IgnoreNotFound(err) == nil { | ||
| ctx.Log.Info("Related Job missing, will delete helm chart and retry", "namespace", ctx.Namespace, "jobName", jobName) | ||
| if err = helm.DeleteReleaseIfExists(releaseName, ctx.Namespace); err != nil { | ||
| ctx.Log.Error(err, "can't delete dataload release", "namespace", ctx.Namespace, "releaseName", releaseName) | ||
| return | ||
| } | ||
| } | ||
| ctx.Log.Error(err, "can't get dataload job", "namespace", ctx.Namespace, "jobName", jobName) | ||
| return | ||
| } |
There was a problem hiding this comment.
There is a missing return statement when the related job is not found and the helm release is successfully deleted. If helm.DeleteReleaseIfExists succeeds, err is set to nil. Without an early return, the execution flows to ctx.Log.Error(err, ...) which will log a nil error as an error, and then return result, nil instead of returning early.
| if err != nil { | |
| if utils.IgnoreNotFound(err) == nil { | |
| ctx.Log.Info("Related Job missing, will delete helm chart and retry", "namespace", ctx.Namespace, "jobName", jobName) | |
| if err = helm.DeleteReleaseIfExists(releaseName, ctx.Namespace); err != nil { | |
| ctx.Log.Error(err, "can't delete dataload release", "namespace", ctx.Namespace, "releaseName", releaseName) | |
| return | |
| } | |
| } | |
| ctx.Log.Error(err, "can't get dataload job", "namespace", ctx.Namespace, "jobName", jobName) | |
| return | |
| } | |
| if err != nil { | |
| if utils.IgnoreNotFound(err) == nil { | |
| ctx.Log.Info("Related Job missing, will delete helm chart and retry", "namespace", ctx.Namespace, "jobName", jobName) | |
| if err = helm.DeleteReleaseIfExists(releaseName, ctx.Namespace); err != nil { | |
| ctx.Log.Error(err, "can't delete dataload release", "namespace", ctx.Namespace, "releaseName", releaseName) | |
| return | |
| } | |
| return | |
| } | |
| ctx.Log.Error(err, "can't get dataload job", "namespace", ctx.Namespace, "jobName", jobName) | |
| return | |
| } |
| if err = helm.DeleteReleaseIfExists(releaseName, ctx.Namespace); err != nil { | ||
| o.Log.Error(err, "can't delete DataMigrate release", "namespace", ctx.Namespace, "releaseName", releaseName) | ||
| return | ||
| } |
There was a problem hiding this comment.
For consistency and proper context propagation, use ctx.Log instead of o.Log to log the error when deleting the release fails.
| if err = helm.DeleteReleaseIfExists(releaseName, ctx.Namespace); err != nil { | |
| o.Log.Error(err, "can't delete DataMigrate release", "namespace", ctx.Namespace, "releaseName", releaseName) | |
| return | |
| } | |
| if err = helm.DeleteReleaseIfExists(releaseName, ctx.Namespace); err != nil { | |
| ctx.Log.Error(err, "can't delete DataMigrate release", "namespace", ctx.Namespace, "releaseName", releaseName) | |
| return | |
| } |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #5945 +/- ##
=======================================
Coverage 63.35% 63.35%
=======================================
Files 481 481
Lines 33070 33070
=======================================
Hits 20953 20953
Misses 10442 10442
Partials 1675 1675 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…andler Add explicit return after successfully deleting the helm release when the job is not found, consistent with the DataMigrate implementation. Addresses review comment on fluid-cloudnative#5945 Signed-off-by: Aditya Upasani <adityaupasani29@gmail.com>
|
New changes are detected. LGTM label has been removed. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@cheyang Addressed the review nit added the missing return after helm release deletion in dataload/status_handler.go to match the DataMigrate pattern. |
…rate Use ctx.Log for consistency with other handlers and proper context propagation when logging helm release deletion errors. Addresses Gemini bot review comment on fluid-cloudnative#5945 Signed-off-by: Aditya Upasani <adityaupasani29@gmail.com>
|
Addressed Gemini bot's review comment changed /assign @cheyang |
| @@ -188,6 +188,54 @@ func (c *CronStatusHandler) GetOperationStatus(ctx cruntime.ReconcileRequestCont | |||
| } | |||
|
|
|||
| func (o *OnEventStatusHandler) GetOperationStatus(ctx cruntime.ReconcileRequestContext, opStatus *datav1alpha1.OperationStatus) (result *datav1alpha1.OperationStatus, err error) { | |||
There was a problem hiding this comment.
This is almost a line-for-line copy of OnceStatusHandler.GetOperationStatus above (lines 54-111), and the same duplication shows up in the datamigrate package. Could the shared body be pulled into a small helper (e.g. one that takes the client plus releaseName/jobName and returns the updated OperationStatus) so the two handlers do not drift over time? Not blocking, but it would make future status-handling fixes a single edit instead of four.
There was a problem hiding this comment.
Acknowledged . I'll keep this fix focused and open a separate issue + PR for the refactor to avoid expanding scope here. Happy to do it as a follow-up once this merges.
| result = opStatus.DeepCopy() | ||
| object := o.dataMigrate | ||
| releaseName := utils.GetDataMigrateReleaseName(object.GetName()) | ||
| jobName := utils.GetDataMigrateJobName(releaseName) |
There was a problem hiding this comment.
GetJob looks the job up with object.GetNamespace(), but the missing-job branch logs and deletes the release using ctx.Namespace. This matches the existing Once handler so I won't block on it, but it would be more robust to use a single namespace source throughout.
| ctx.Log.Error(err, "can't delete dataload release", "namespace", ctx.Namespace, "releaseName", releaseName) | ||
| return | ||
| } | ||
| return |
There was a problem hiding this comment.
Good call adding the explicit return here after DeleteReleaseIfExists succeeds. The OnceStatusHandler version falls through and ends up logging a nil error, so this branch is genuinely cleaner and matches the Cron handler. Worth keeping.
| ctx.Log.V(1).Info("DataLoad chart already existed, check its running status") | ||
| job, err := kubeclient.GetJob(o.Client, jobName, ctx.Namespace) | ||
| if err != nil { | ||
| if utils.IgnoreNotFound(err) == nil { |
There was a problem hiding this comment.
This job-missing branch (the helm.DeleteReleaseIfExists call plus early return) isn't exercised by any test. Since it's a real control-flow path, a case where the job is NotFound would be worth adding.
There was a problem hiding this comment.
The job-missing path is hard to cover in unit tests because DeleteReleaseIfExists shells out to the ddc-helm binary which isn't present in the test environment the same gap exists in OnceStatusHandler. I attempted to add a NotFound test but it fails with exec: "ddc-helm": executable file not found. Happy to add it if there's a recommended way to mock helm calls in this codebase, otherwise I'll leave it consistent with the existing OnceStatusHandler pattern.
| } | ||
| isJobSucceed := finishedJobCondition.Type == batchv1.JobComplete | ||
|
|
||
| if result.NodeAffinity == nil && isJobSucceed { |
There was a problem hiding this comment.
The success-path tests use jobs without the affinity-inject annotation, so GenerateNodeAffinity short-circuits to (nil, nil) and this label-generation block (plus its wrapped-error path) never actually runs. A test with the annotation present would cover it.
| } | ||
|
|
||
| isJobSucceed := finishedJobCondition.Type == batchv1.JobComplete | ||
| if o.dataMigrate.Spec.Parallelism == 1 && result.NodeAffinity == nil && isJobSucceed { |
There was a problem hiding this comment.
Only Parallelism == 1 is tested, so the branch that skips NodeAffinity for parallel migrations isn't verified. A Parallelism > 1 case would lock this guard in.
| fakeClient := fake.NewFakeClientWithScheme(testScheme, dm) | ||
| handler := &OnEventStatusHandler{Client: fakeClient, dataMigrate: dm} | ||
| ctx := cruntime.ReconcileRequestContext{Log: fake.NullLogger()} | ||
| var ( |
There was a problem hiding this comment.
While you're touching this file: the header reads Copyright 2026 whereas its siblings use 2023. A mismatched license header is a likely cause of the failing Project Check, so it's worth aligning and ruling out before the next CI run.
- Fix copyright year 2026 -> 2023 in datamigrate status_handler_test.go (was causing Project Check / lint failure) - Run gofmt on implementation files to fix formatting - Add NodeAffinity generation test with affinity annotation for DataLoad - Add Parallelism > 1 test to verify NodeAffinity is skipped for parallel DataMigrate operations - Use object.GetNamespace() consistently instead of ctx.Namespace in DataMigrate OnEventStatusHandler for correctness Addresses review comments by cheyang on fluid-cloudnative#5945 Signed-off-by: Aditya Upasani <adityaupasani29@gmail.com>
|



What this PR does
Fixes #5944
OnEventStatusHandler.GetOperationStatuswas an unimplemented stub returningnil, nilin bothdataloadanddatamigratepackages. SinceOnEventis a valid selectable policy, this caused silent status loss at runtime — the operation status was overwritten withnilwith no error surfaced to the user.Root cause
In
pkg/ddc/base/operation.go, the result ofGetOperationStatusis used directly:errisnilso no error is loggedopStatusToUpdateisnilreflect.DeepEqual(opStatus, nil)is false, soUpdateOperationApiStatus(nil)is calledFix
Implement
OnEventStatusHandler.GetOperationStatusin both packages following the same pattern asOnceStatusHandler— look up the triggered job, check its finished condition, and return the correct phase, conditions and duration.Tests
Replace the stub tests (
returns nil result and nil error) with real coverage:PhaseCompletePhaseFailed