Conversation
umswmayj
commented
Mar 27, 2026
- more logging
- better debug cli tool
- default config is more active
* more logging * better debug cli tool * default config is more active
📝 WalkthroughWalkthroughUpdates Helm scheduling defaults, adds missing-VM tracking/logging in the failover controller, and extends the visualize-reservations tool with config-driven multi-context support and optional kubectl port-forwarding for Postgres connections. Changes
Sequence DiagramssequenceDiagram
participant User as User/CLI
participant Main as visualize-reservations
participant Parser as Config Parser
participant K8s as K8s API (per-context clients)
participant PF as kubectl port‑forward
participant Postgres as Postgres DB
User->>Main: start CLI (flags / --config)
Main->>Parser: load config + apply flag overrides
Parser-->>Main: contexts, options
loop for each hypervisor context
Main->>K8s: get/create cached client for context
K8s-->>Main: client
Main->>K8s: list HypervisorList
K8s-->>Main: hypervisors
end
alt port‑forward enabled
Main->>PF: start kubectl port‑forward (probe)
PF-->>Main: ready
Main->>Postgres: connect to localhost:forwarded_port
else direct connect
Main->>Postgres: connect to configured host:port
end
Postgres-->>Main: connection established / ping
loop for each reservation context
Main->>K8s: get/create cached client for context
K8s-->>Main: client
Main->>K8s: list ReservationList
K8s-->>Main: reservations
end
Main->>Main: aggregate & optionally rewrite names (`@context`)
Main-->>User: print summary
Main->>PF: cleanup (defer)
PF-->>Main: subprocess terminated
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly Related PRs
Suggested Reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Test Coverage ReportTest Coverage 📊: 68.1% |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/scheduling/reservations/failover/controller.go`:
- Line 272: Update the TODO comment in the failover controller so it uses
correct spelling and conventional casing: change "todo: vms are vms from all
AZs, we should consdier processing them by AZ (sequencial or in parallel) but
not mixing them together" to "TODO: vms are vms from all AZs; we should consider
processing them by AZ (sequentially or in parallel) but not mixing them
together" — modify the comment near the failover controller's VM-processing
logic (mentions "vms", "AZs") accordingly.
In `@tools/visualize-reservations/main.go`:
- Around line 610-613: The code treats an empty allReservations.Items as an
error; change the check so we only exit when the set is empty AND there were
read errors. Modify the block around allReservations and reservationReadErrors
to only print the error and return when len(allReservations.Items) == 0 &&
reservationReadErrors > 0 (so a legitimately empty result with zero read errors
is allowed to continue and render the hypervisor/summary output).
- Around line 604-607: The loop is mutating reservation.Name via
annotateReservationName(reservation.Name, resCtx, showReservationContextInName),
causing double-qualification like name@context@host; instead preserve the
original Name and carry context separately: in the block that iterates
reservationsInContext.Items (and where resCopy is created), stop writing into
resCopy.Name and instead set a new metadata field (e.g.,
resCopy.Annotations["reservationContext"] or a composite key field on the
reservation struct) that stores resCtx or the annotated display name; update any
downstream code that builds reservation references (the code that appends
"@host") to read the context from this new metadata key (or compose using both
Name and reservationContext) rather than from Name; keep function
annotateReservationName only for display purposes and adjust consumers to use
the separate context metadata instead of relying on mutated Name.
- Around line 310-322: The current readiness loop uses net.DialTimeout to
127.0.0.1:localPort which only proves "something" is listening and can return
success when the local port was already occupied; instead detect kubectl
port-forward readiness and fail if the port was already in use or the
port-forward child exits. After starting the port-forward process, replace the
bare Dial probe with: (1) monitor the port-forward process (the exec.Cmd you
spawn) for early exit (use cmd.Wait in a goroutine and check its error), (2)
read the command's stdout/stderr and wait for the explicit "Forwarding from" /
"Handling connection" ready message before declaring readiness, and (3)
optionally perform an application-level probe to the forwarded endpoint if
appropriate; keep cleanup() invoked and return an error if the child process
exits or if stderr indicates "address already in use" so you fail fast instead
of treating net.DialTimeout success as readiness.
- Around line 394-447: The current code checks for "unset" flags by comparing
flag values to defaults (e.g., sortBy vs defaultSort, viewsFlag vs defaultViews,
postgresSecret, namespace, postgresHostOverride, postgresPortOverride, hideFlag,
filterName, filterTrait, hypervisorContext, hypervisorContexts,
reservationContext, reservationContexts, postgresContext, postgresPortForward,
postgresPortForwardService, postgresPortForwardLocalPort,
postgresPortForwardRemotePort) which incorrectly treats an explicitly provided
flag equal to its default as "unset"; instead, use flag.Visit() to build a
set/map of explicitly visited flag names at startup and only apply cfg.* values
to variables when that flag name is NOT in the visited set so explicit CLI flags
always take precedence over the config file.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: bd2473cc-26ec-4a71-a7ce-5127bf9cad5c
📒 Files selected for processing (3)
helm/bundles/cortex-nova/values.yamlinternal/scheduling/reservations/failover/controller.gotools/visualize-reservations/main.go
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (4)
tools/visualize-reservations/main.go (4)
611-613:⚠️ Potential issue | 🟡 MinorDon't abort on a valid empty reservation list.
If every
Listsucceeds and the cluster simply has zero reservations, this still exits and skips the rest of the report. Only return early whenlen(allReservations.Items) == 0andreservationReadErrors > 0.💡 Minimal fix
-if len(allReservations.Items) == 0 { +if len(allReservations.Items) == 0 && reservationReadErrors > 0 {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/visualize-reservations/main.go` around lines 611 - 613, The early return aborts even when every List succeeded but there are zero reservations; change the condition in the block that checks len(allReservations.Items) to only treat it as an error when reservationReadErrors > 0. Update the if from "if len(allReservations.Items) == 0 { ... return }" to "if len(allReservations.Items) == 0 && reservationReadErrors > 0 { fmt.Fprintf(..., \"Error: could not list reservations from any context (errors: %d)\\n\", reservationReadErrors); return }" so that an empty but-successful result proceeds; reference allReservations.Items and reservationReadErrors in the change.
395-448:⚠️ Potential issue | 🟠 MajorTrack explicitly visited flags instead of comparing against defaults.
This still treats values like
--sort=vm,--views=all,--postgres-secret=cortex-nova-postgres, and--postgres-port-forward=falseas “unset”, so the JSON config can override an explicit CLI choice. Build a visited set withflag.Visit()and only applycfg.*when that flag name was not visited.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/visualize-reservations/main.go` around lines 395 - 448, The current code overwrites CLI flags with cfg values by comparing against defaults, which incorrectly treats explicit CLI values (e.g., sortBy, viewsFlag, postgresSecret, postgresPortForward) as unset; change this to track explicitly visited flags using flag.Visit() to build a visited set (map[string]bool) and then only apply each cfg.* override when the corresponding flag name is NOT in that visited set for variables like sortBy, postgresSecret, namespace, postgresHostOverride, postgresPortOverride, viewsFlag, hideFlag, filterName, filterTrait, hypervisorContext, hypervisorContexts, reservationContext, reservationContexts, postgresContext, postgresPortForward, postgresPortForwardService, postgresPortForwardLocalPort, postgresPortForwardRemotePort so CLI-provided values always win over config.
298-325:⚠️ Potential issue | 🟠 MajorDon't use a bare TCP dial as the port-forward readiness check.
A successful dial here only proves that something is listening on
127.0.0.1:localPort. If that port was already occupied, this loop reports ready whilekubectl port-forwardhas already failed and the tool queries the wrong endpoint. Wait for the child process/read its output and fail fast on early exit oraddress already in use.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/visualize-reservations/main.go` around lines 298 - 325, The readiness loop uses a bare net.DialTimeout which can falsely succeed if another process is listening; modify the port-forward startup (where exec.Command is created and cmd is started) to capture and monitor the child process output and exit state: create pipes via cmd.StdoutPipe()/cmd.StderrPipe() (instead of assigning os.Stderr), start a goroutine that reads those pipes and watches for fatal errors like "address already in use" or an early exit by calling cmd.Wait() and sending any error on a channel, then in the readiness loop (the code using net.DialTimeout and readyDeadline) also select on that error channel so you fail fast if kubectl exits or reports bind errors; keep the existing cleanup closure (cleanup, cmd.Process.Kill/Wait) but ensure you return the observed error from the watcher rather than reporting ready when the child has failed.
605-608:⚠️ Potential issue | 🟠 MajorKeep context separate from
Reservation.Name.Rewriting
resCopy.Nametoname@contextbreaks later code that still treatsNameas the original object identifier. At Line 740 and Line 901 this becomesname@context@host, which also breaks theres-hostsort path that expects exactly one@. PreserveNameand carry the context/display label separately.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/visualize-reservations/main.go` around lines 605 - 608, The code currently overwrites resCopy.Name with annotateReservationName(reservation.Name, ...) which corrupts the original identifier; instead preserve reservation.Name and store the annotated label separately (e.g., set resCopy.AnnotatedName or resCopy.DisplayName or an annotation map entry like resCopy.Annotations["displayName"] = annotateReservationName(...)) before appending to allReservations.Items; then update downstream code that currently concatenates Name (the callers around the places that build name@context@host) to use the new AnnotatedName/DisplayName for display purposes while keeping Name untouched for identity-based logic and sort paths that expect exactly one '@'.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tools/visualize-reservations/main.go`:
- Around line 1690-1693: The code currently checks if host == "" and
!enablePortForward and returns before applying direct overrides, preventing
--postgres-host/--postgres-port from recovering a blank secret; move the logic
that applies hostOverride and portOverride (the code that assigns to host and
port from hostOverride/portOverride) to run before the empty-host validation
(the block that checks host == "" && !enablePortForward) — do the same fix for
the second occurrence around the 1726-1733 region so validation happens after
overrides are applied, or alternatively change the validation to re-check host
after applying hostOverride/portOverride.
---
Duplicate comments:
In `@tools/visualize-reservations/main.go`:
- Around line 611-613: The early return aborts even when every List succeeded
but there are zero reservations; change the condition in the block that checks
len(allReservations.Items) to only treat it as an error when
reservationReadErrors > 0. Update the if from "if len(allReservations.Items) ==
0 { ... return }" to "if len(allReservations.Items) == 0 &&
reservationReadErrors > 0 { fmt.Fprintf(..., \"Error: could not list
reservations from any context (errors: %d)\\n\", reservationReadErrors); return
}" so that an empty but-successful result proceeds; reference
allReservations.Items and reservationReadErrors in the change.
- Around line 395-448: The current code overwrites CLI flags with cfg values by
comparing against defaults, which incorrectly treats explicit CLI values (e.g.,
sortBy, viewsFlag, postgresSecret, postgresPortForward) as unset; change this to
track explicitly visited flags using flag.Visit() to build a visited set
(map[string]bool) and then only apply each cfg.* override when the corresponding
flag name is NOT in that visited set for variables like sortBy, postgresSecret,
namespace, postgresHostOverride, postgresPortOverride, viewsFlag, hideFlag,
filterName, filterTrait, hypervisorContext, hypervisorContexts,
reservationContext, reservationContexts, postgresContext, postgresPortForward,
postgresPortForwardService, postgresPortForwardLocalPort,
postgresPortForwardRemotePort so CLI-provided values always win over config.
- Around line 298-325: The readiness loop uses a bare net.DialTimeout which can
falsely succeed if another process is listening; modify the port-forward startup
(where exec.Command is created and cmd is started) to capture and monitor the
child process output and exit state: create pipes via
cmd.StdoutPipe()/cmd.StderrPipe() (instead of assigning os.Stderr), start a
goroutine that reads those pipes and watches for fatal errors like "address
already in use" or an early exit by calling cmd.Wait() and sending any error on
a channel, then in the readiness loop (the code using net.DialTimeout and
readyDeadline) also select on that error channel so you fail fast if kubectl
exits or reports bind errors; keep the existing cleanup closure (cleanup,
cmd.Process.Kill/Wait) but ensure you return the observed error from the watcher
rather than reporting ready when the child has failed.
- Around line 605-608: The code currently overwrites resCopy.Name with
annotateReservationName(reservation.Name, ...) which corrupts the original
identifier; instead preserve reservation.Name and store the annotated label
separately (e.g., set resCopy.AnnotatedName or resCopy.DisplayName or an
annotation map entry like resCopy.Annotations["displayName"] =
annotateReservationName(...)) before appending to allReservations.Items; then
update downstream code that currently concatenates Name (the callers around the
places that build name@context@host) to use the new AnnotatedName/DisplayName
for display purposes while keeping Name untouched for identity-based logic and
sort paths that expect exactly one '@'.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 371f6866-b21b-4e20-b2d5-7fd50666f3c5
📒 Files selected for processing (1)
tools/visualize-reservations/main.go