Skip to content

fix: propagate numCores and extraCores from wekacluster to wekacontainers per role (OP-260)#2395

Closed
kristina-solovyova wants to merge 1 commit into04-08-fix_ensure_hugepages_spec_hash_reflects_dpdk_base_memory_values_for_correct_change_detectionfrom
03-25-fix_propagate_numcores_and_extracores_from_wekacluster_to_wekacontainers_per_role
Closed

fix: propagate numCores and extraCores from wekacluster to wekacontainers per role (OP-260)#2395
kristina-solovyova wants to merge 1 commit into04-08-fix_ensure_hugepages_spec_hash_reflects_dpdk_base_memory_values_for_correct_change_detectionfrom
03-25-fix_propagate_numcores_and_extracores_from_wekacluster_to_wekacontainers_per_role

Conversation

@kristina-solovyova
Copy link
Copy Markdown
Collaborator

No description provided.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 25, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA ff0a024.
Ensure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice.

Scanned Files

None

Copy link
Copy Markdown
Collaborator Author

kristina-solovyova commented Mar 25, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label main-merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has required the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@kristina-solovyova kristina-solovyova changed the base branch from main to graphite-base/2395 March 25, 2026 12:32
@kristina-solovyova kristina-solovyova force-pushed the 03-25-fix_propagate_numcores_and_extracores_from_wekacluster_to_wekacontainers_per_role branch from abce97b to 50df2fb Compare March 25, 2026 12:32
@kristina-solovyova kristina-solovyova changed the base branch from graphite-base/2395 to 01-21-feat_add_smbw_support March 25, 2026 12:32
@kristina-solovyova kristina-solovyova force-pushed the 03-25-fix_propagate_numcores_and_extracores_from_wekacluster_to_wekacontainers_per_role branch from 50df2fb to 10dbe9f Compare March 25, 2026 12:45
@assafgi assafgi changed the base branch from 01-21-feat_add_smbw_support to graphite-base/2395 March 25, 2026 12:45
@weka weka deleted a comment from claude Bot Mar 25, 2026
@weka weka deleted a comment from claude Bot Mar 25, 2026
@kristina-solovyova kristina-solovyova force-pushed the 03-25-fix_propagate_numcores_and_extracores_from_wekacluster_to_wekacontainers_per_role branch from 10dbe9f to a5f51e7 Compare March 25, 2026 12:53
@kristina-solovyova kristina-solovyova changed the base branch from graphite-base/2395 to 01-21-feat_add_smbw_support March 25, 2026 12:53
@kristina-solovyova kristina-solovyova marked this pull request as ready for review March 25, 2026 12:53
@kristina-solovyova kristina-solovyova requested a review from a team as a code owner March 25, 2026 12:53
@claude
Copy link
Copy Markdown

claude Bot commented Mar 25, 2026

Claude finished @kristina-solovyova's task in 1m 35s —— View job


Code Review

  • Gather context (parent PR diff, changed files)
  • Review the changed file
  • Post review feedback

Summary

This PR does two things: (1) fixes a real bug where NumCores was never propagated from WekaCluster to WekaContainer specs, and (2) refactors the repeated switch role statements into a unified forRole() helper. The changes are clean and well-motivated.


Observations

Bug fix: NumCores propagation (funcs_upgrade.go:399)

NumCores was simply absent from HandleSpecUpdates. The fix is correct — GetNonZeroOrDefault preserves existing container values when the cluster spec has 0 (unset), which is the right semantic.

container.Spec.NumCores = util.GetNonZeroOrDefault(rv.NumCores, container.Spec.NumCores)

One implicit design decision worth documenting: once NumCores is set on a container, it can never be reduced to 0 via the cluster spec (since 0 is treated as "not set"). This appears intentional but is worth a comment if not already covered elsewhere.


Overrides refactor (funcs_upgrade.go:296–346)

The old pattern fetched overrides multiple times and conditionally wrote back. The new pattern fetches once, mutates, and unconditionally writes back at the end:

overrides := container.Spec.GetOverrides()
// ... modify fields ...
container.Spec.Overrides = overrides  // always assigned

This relies on client.MergeFrom to suppress no-op API calls, which is correct and idiomatic. No functional regression.

One subtle behavior change: UpgradePreventEviction on compute containers is now always written (even when value didn't change). This is fine since the patch mechanism handles it, but the old code's explicit guards made the intent clearer.


forRole() helper (funcs_upgrade.go:45–61)

Clean consolidation. The fallthrough to return containerRoleSpec{} for unknown roles is correct — unknown roles will get zero values and GetNonZeroOrDefault will preserve existing container fields.


dynamicTemplate nil-guard removal (funcs_upgrade.go:127–144)

The old code had a manual nil check:

// old
if spec.Dynamic != nil {
    computeExtraCores = spec.Dynamic.ComputeExtraCores
}

The new code reads from dynamicTemplate, which is always non-nil (falls back to the default template from GetWekaClusterTemplate). When spec.Dynamic == nil, the default template returns 0 for ExtraCores fields, so behavior is preserved.


Potential issue: SmbwHugepages default uses tmpl.Cores.Smbw (funcs_upgrade.go:224)

SmbwHugepages: util.GetNonZeroOrDefault(dynamicTemplate.SmbwFrontendHugepages, 1400*tmpl.Cores.Smbw),

Note tmpl (the default template from GetWekaClusterTemplate(spec.Dynamic)) is used rather than dynamicTemplate for the Smbw core count. This is intentional — you want the core count from the resolved template, not the raw config — but it's subtle. If spec.Dynamic specifies SmbwCores but SmbwFrontendHugepages is unset, the default is computed from the resolved core count, which is correct. No issue here, just worth noting.


Verdict

The refactoring is solid — no regressions visible, the NumCores propagation fix is the key value, and Smbw support is properly wired in throughout. The PR is ready to merge after the stack is resolved.

@graphite-app graphite-app Bot requested review from assafgi and tigrawap March 25, 2026 12:54
@graphite-app
Copy link
Copy Markdown

graphite-app Bot commented Mar 25, 2026

Graphite Automations

"Add anton/matt/sergey/kristina as reviwers on operator PRs" took an action on this PR • (03/25/26)

2 reviewers were added to this PR based on Anton Bykov's automation.

@kristina-solovyova kristina-solovyova force-pushed the 03-25-fix_propagate_numcores_and_extracores_from_wekacluster_to_wekacontainers_per_role branch from a5f51e7 to 716e161 Compare March 25, 2026 12:55
@claude
Copy link
Copy Markdown

claude Bot commented Mar 25, 2026

Claude finished @kristina-solovyova's task in 31s —— View job


No changes since the last review — the PR still contains the same single commit (716e1611). See the previous review above for full feedback.

@kristina-solovyova kristina-solovyova changed the title fix: propagate numCores and extraCores from wekacluster to wekacontainers per role fix: propagate numCores and extraCores from wekacluster to wekacontainers per role (OP-260) Mar 25, 2026
@kristina-solovyova kristina-solovyova force-pushed the 03-25-fix_propagate_numcores_and_extracores_from_wekacluster_to_wekacontainers_per_role branch from 716e161 to 97c6000 Compare March 25, 2026 13:44
@claude
Copy link
Copy Markdown

claude Bot commented Mar 25, 2026

Claude finished @kristina-solovyova's task in 1m 39s —— View job


  • Check for new changes since last review
  • Review changes
  • Post review feedback

Code Review (updated commit 97c60007)

The commit was amended since the last review. Key changes from the prior version:


New: explicit NumCores increase-only semantics (funcs_upgrade.go:399–402)

// NumCores can only be increased — decreasing requires manual intervention (container restart/reconfiguration).
if rv.NumCores > container.Spec.NumCores {
    container.Spec.NumCores = rv.NumCores
}

The previous version used GetNonZeroOrDefault (which would silently preserve existing values when cluster spec has 0). This version uses a > guard, which is more explicit and correct: it documents intent, allows reducing to a positive but lower value by hand, and still won't trigger on unset (0) values in the cluster spec. Good improvement.


New: ExtraCores vs NumCores semantic comment (funcs_upgrade.go:394–395)

// ExtraCores allows zero (explicit removal), so uses equality guard.
// NumCores and Hugepages treat zero as "not set" — only update when non-zero.

This directly addresses the implicit design decision flagged in the previous review. No changes needed.


New: Smbw fully wired throughout

SmbwCores, SmbwExtraCores, SmbwHugepages, SmbwHugepagesOffset are now added to UpdatableClusterSpec, populated from dynamicTemplate in NewUpdatableClusterSpec, and included in forRole(). The SmbwHugepages default (1400 * tmpl.Cores.Smbw) uses the resolved template core count, which is correct.

One edge case: if tmpl.Cores.Smbw == 0 (no Smbw cores configured), the default hugepages will be 0. GetNonZeroOrDefault will then fall back to the container's existing value, so no regression — but it's worth confirming this is intended when Smbw is not configured.


DriversBuildId comparison simplified (funcs_upgrade.go:410–412)

oldId, newId := container.Spec.DriversBuildId, updatableSpec.DriversBuildId
if (oldId == nil) != (newId == nil) || (oldId != nil && *oldId != *newId) {

Cleaner than the previous triple-condition. Logically equivalent.


Verdict

All issues from the previous review are addressed. The Smbw propagation is complete, NumCores semantics are correctly documented and implemented, and the refactoring is clean. Ready to merge once the stack is resolved.

Base automatically changed from 01-21-feat_add_smbw_support to main March 25, 2026 17:10
@kristina-solovyova kristina-solovyova changed the base branch from main to graphite-base/2395 April 8, 2026 21:31
@kristina-solovyova kristina-solovyova force-pushed the 03-25-fix_propagate_numcores_and_extracores_from_wekacluster_to_wekacontainers_per_role branch from 97c6000 to 75445b1 Compare April 8, 2026 21:31
@kristina-solovyova kristina-solovyova changed the base branch from graphite-base/2395 to 04-08-fix_ensure_hugepages_spec_hash_reflects_dpdk_base_memory_values_for_correct_change_detection April 8, 2026 21:33
@kristina-solovyova kristina-solovyova force-pushed the 03-25-fix_propagate_numcores_and_extracores_from_wekacluster_to_wekacontainers_per_role branch from 75445b1 to ff0a024 Compare April 9, 2026 10:35
@kristina-solovyova kristina-solovyova force-pushed the 04-08-fix_ensure_hugepages_spec_hash_reflects_dpdk_base_memory_values_for_correct_change_detection branch from eb66369 to 0baec97 Compare April 9, 2026 10:35
@claude
Copy link
Copy Markdown

claude Bot commented Apr 9, 2026

Claude finished @kristina-solovyova's task in 2m 7s —— View job


Code Review


Summary

This synchronization event was triggered by a rebase onto the updated parent PR (#2439). The commit was force-pushed (amended from 97c60007 to ff0a0246), but the functional content is unchanged from the previous review.

No new code changes to review — the previous review (2026-03-25) covers this PR in full and concluded it was ready to merge.


Verified: No Regressions Against Updated Parent

Parent PR #2439 added hugepagesIsUserSet/hugepagesOffsetIsUserSet tracking in GetContainerHugepages — user-set values now represent the total (weka + DPDK), so DPDK is NOT added on top. This PR propagates those computed hugepages values via rv.Hugepages (from UpdatableClusterSpec.*Hugepages) without adding any DPDK on top again. The two PRs compose correctly.


Verdict

Still ready to merge once the stack is resolved.

@graphite-app
Copy link
Copy Markdown

graphite-app Bot commented Apr 11, 2026

Merge activity

  • Apr 11, 7:13 PM UTC: kristina-solovyova added this pull request to the Graphite merge queue.
  • Apr 11, 7:14 PM UTC: CI is running for this pull request on a draft pull request (#2446) due to your merge queue CI optimization settings.
  • Apr 11, 7:51 PM UTC: Merged by the Graphite merge queue via draft PR: #2446.

@graphite-app graphite-app Bot closed this Apr 11, 2026
@graphite-app graphite-app Bot deleted the 03-25-fix_propagate_numcores_and_extracores_from_wekacluster_to_wekacontainers_per_role branch April 11, 2026 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants