feat(gateway): URL-stable identity-based cluster naming for API-level upstreams#2137
feat(gateway): URL-stable identity-based cluster naming for API-level upstreams#2137mehara-rothila wants to merge 3 commits into
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
534247e to
a99b916
Compare
|
@coderabbitai full review |
✅ Action performedFull review finished. |
5c3aed7 to
840cb13
Compare
|
@coderabbitai please do a full review of this PR , concrete suggestions, a risk analysis, and anything else you can surface. Please read the actual code, not just the description. Correctness & design
Risk analysis
Edge cases
Tests
|
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== clusterkey package ==="
fd -t f . pkg/utils/clusterkey --exec cat {}
echo ""
echo "=== translator.go - resolveUpstreamCluster and related ==="
rg -n "resolveUpstreamCluster|sanitizeClusterName|clusterkey|APILevel|APILevelName|ClusterKey|EnvoyClusterName" pkg/xds/translator.go -A 5 -B 2
echo ""
echo "=== restapi.go - addUpstreamCluster and related ==="
rg -n "addUpstreamCluster|sanitizeEnvoyClusterName|clusterkey|APILevel|APILevelName|ClusterKey|EnvoyClusterName" pkg/transform/restapi.go -A 5 -B 2
echo ""
echo "=== Full clusterkey file list ==="
fd -t f . pkg/utils/clusterkey
echo ""
echo "=== Full translator.go resolveUpstreamCluster function ==="
rg -n "func resolveUpstreamCluster" pkg/xds/translator.go -A 50
echo ""
echo "=== Full addUpstreamCluster function ==="
rg -n "func addUpstreamCluster" pkg/transform/restapi.go -A 80
echo ""
echo "=== Integration test feature file ==="
fd -t f "api-level-url-stable.feature" --exec cat {}
echo ""
echo "=== Translator test file - new test cases ==="
fd -t f . pkg/xds -e go -e _test.go --exec rg -n "DedupSameAPI|MainSandboxNever|TestResolveUpstreamCluster" {} -A 30
echo ""
echo "=== restapi transformer test - new test cases ==="
fd -t f . pkg/transform -e go --exec rg -n "APILevelDefaultCluster|APILevelURL|TestRestAPITransformer_APILevel" {} -A 30
echo ""
echo "=== cmd/controller/main.go cluster name comment ==="
rg -n "cluster" cmd/controller/main.go -B 2 -A 5
echo ""
echo "=== Check if any old sanitize references remain ==="
rg -rn "sanitizeClusterName|sanitizeEnvoyClusterName|cluster_http|cluster_https" --include="*.go"
echo ""
echo "=== Check ClusterKey and EnvoyClusterName in restapi.go ==="
rg -n "ClusterKey\|EnvoyClusterName" pkg/transform/restapi.go -B 3 -A 3
echo ""
echo "=== data-model.md cluster section ==="
fd -t f "data-model.md" --exec grep -n -A 20 -i "cluster.nam" {}Length of output: 2285 🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== clusterkey package full content ==="
find . -path "*/utils/clusterkey*" -type f | xargs cat
echo ""
echo "=== Check all clusterkey usages ==="
rg -rn "clusterkey\." --include="*.go" -B 2 -A 3
echo ""
echo "=== Check APILevelName / APILevel usage ==="
rg -rn "APILevelName\|APILevel(" --include="*.go" -B 2 -A 3
echo ""
echo "=== clusterkey test file ==="
find . -path "*/utils/clusterkey*_test*" -o -path "*/clusterkey*test*" | xargs cat 2>/dev/null || true
echo ""
echo "=== translator.go full resolveUpstreamCluster ==="
awk '/func resolveUpstreamCluster/,/^func /' pkg/xds/translator.go 2>/dev/null | head -80
echo ""
echo "=== restapi.go full addUpstreamCluster ==="
awk '/func addUpstreamCluster/,/^func /' pkg/transform/restapi.go 2>/dev/null | head -100
echo ""
echo "=== RDC struct definitions: UpstreamCluster, ClusterKey, EnvoyClusterName ==="
rg -rn "ClusterKey\b\|EnvoyClusterName\b" --include="*.go" -B 2 -A 2
echo ""
echo "=== Check for any remaining references to old URL-derived naming ==="
rg -rn 'cluster_\|scheme.*cluster\|sanitize.*cluster\|cluster.*sanitize' --include="*.go" -B 2 -A 2
echo ""
echo "=== Full translator test for new cases ==="
rg -rn "TestResolveUpstreamCluster" --include="*.go" -A 60 | head -200
echo ""
echo "=== Full restapi transformer test for new cases ==="
rg -rn "TestRestAPITransformer_APILevel" --include="*.go" -A 60 | head -200
echo ""
echo "=== xds snapshot atomicity - how snapshot is built ==="
rg -rn "snapshot\|Snapshot\|SetSnapshot\|GenerateSnapshot" pkg/xds/translator.go -B 2 -A 5 | head -80
echo ""
echo "=== integration test feature file ==="
find . -name "api-level-url-stable.feature" | xargs cat 2>/dev/null
echo ""
echo "=== Check empty apiID handling ==="
rg -rn 'apiID\|api_id\|ApiId' pkg/utils/clusterkey --include="*.go" -B 1 -A 5 2>/dev/null || true
find . -path "*/clusterkey*" | xargs rg -n "empty\|\"\"" 2>/dev/null || trueLength of output: 8168 🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== translator.go resolveUpstreamCluster ==="
awk '/func resolveUpstreamCluster/,/^func [a-zA-Z]/' pkg/xds/translator.go | head -100
echo ""
echo "=== translator.go clusterkey/APILevel imports and usage ==="
grep -n "clusterkey\|APILevel\|APILevelName\|ClusterKey\|EnvoyClusterName\|sanitize" pkg/xds/translator.go
echo ""
echo "=== restapi.go addUpstreamCluster ==="
awk '/func addUpstreamCluster/,/^func [a-zA-Z]/' pkg/transform/restapi.go | head -120
echo ""
echo "=== restapi.go clusterkey/APILevel imports and usage ==="
grep -n "clusterkey\|APILevel\|APILevelName\|ClusterKey\|EnvoyClusterName\|sanitize" pkg/transform/restapi.go
echo ""
echo "=== restapi.go UpstreamCluster struct / ClusterKey / EnvoyClusterName field definitions ==="
grep -n "ClusterKey\|EnvoyClusterName\|UpstreamCluster" pkg/transform/restapi.go | head -40
echo ""
echo "=== Check struct definitions globally ==="
grep -rn "ClusterKey\b\|EnvoyClusterName\b" --include="*.go" . 2>/dev/null || \
grep -rn "ClusterKey\|EnvoyClusterName" . -l | xargs grep -n "ClusterKey\|EnvoyClusterName" | head -60
echo ""
echo "=== Check struct definitions globally (no --include) ==="
grep -rn "ClusterKey\|EnvoyClusterName" . --include="*.go" 2>/dev/null | head -60 || \
find . -name "*.go" | xargs grep -n "ClusterKey\|EnvoyClusterName" 2>/dev/null | head -60
echo ""
echo "=== Translator tests for new cases ==="
find . -name "*translator*test*" -o -name "*test*translator*" | grep "\.go$" | xargs grep -n "TestResolveUpstreamCluster\|DedupSameAPI\|MainSandboxNever" 2>/dev/null | head -20
find . -name "*_test.go" -path "*/xds/*" | xargs grep -n "TestResolveUpstreamCluster\|DedupSameAPI\|MainSandbox" 2>/dev/null | head -5
echo ""
echo "=== Full translator test cases ==="
find . -name "*_test.go" -path "*/xds/*" | head -5 | xargs -I{} awk '/func TestResolveUpstreamCluster/,/^func [A-Z]/' {} 2>/dev/null | head -200
echo ""
echo "=== Full restapi transformer test cases ==="
find . -name "*_test.go" -path "*/transform/*" | head -5 | xargs -I{} awk '/func TestRestAPITransformer_APILevel/,/^func [A-Z]/' {} 2>/dev/null | head -200
echo ""
echo "=== Integration test ==="
find . -name "api-level-url-stable.feature" 2>/dev/null | xargs cat 2>/dev/null
echo ""
echo "=== Snapshot atomicity in translator ==="
grep -n "snapshot\|Snapshot\|SetSnapshot" pkg/xds/translator.go | head -30
echo ""
echo "=== Check old URL-derived naming remnants ==="
find . -name "*.go" | xargs grep -n "cluster_http\|cluster_https\|sanitizeCluster\|sanitizeEnvoy" 2>/dev/null | head -20
echo ""
echo "=== data-model.md cluster section ==="
find . -name "data-model.md" | xargs grep -n -A 25 -i "cluster.nam\|cluster name\|cluster-nam" 2>/dev/null | head -60Length of output: 3133 🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== Repo root structure ==="
ls -1
echo ""
echo "=== Find translator.go ==="
find . -name "translator.go" 2>/dev/null
echo ""
echo "=== Find restapi.go ==="
find . -name "restapi.go" 2>/dev/null
echo ""
echo "=== Find clusterkey files ==="
find . -name "*.go" | xargs grep -l "APILevel\|clusterkey" 2>/dev/null | head -20
echo ""
echo "=== Find feature file ==="
find . -name "*.feature" | xargs grep -l "url-stable\|url_stable\|cluster" 2>/dev/null | head -10
echo ""
echo "=== Find test files referencing APILevel or DedupSameAPI ==="
find . -name "*_test.go" | xargs grep -l "APILevel\|DedupSameAPI\|APILevelName\|url.stable\|URLStable" 2>/dev/null | head -10
echo ""
echo "=== Full top-level tree (2 levels) ==="
find . -maxdepth 2 -type d | sort | head -60Length of output: 3555 🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== translator.go resolveUpstreamCluster ==="
awk '/func resolveUpstreamCluster/,/^func [a-zA-Z]/' ./gateway/gateway-controller/pkg/xds/translator.go | head -120
echo ""
echo "=== translator.go clusterkey/APILevel/sanitize references ==="
grep -n "clusterkey\|APILevel\|APILevelName\|ClusterKey\|EnvoyClusterName\|sanitize\|import" ./gateway/gateway-controller/pkg/xds/translator.go | head -40
echo ""
echo "=== restapi.go addUpstreamCluster ==="
awk '/func addUpstreamCluster/,/^func [a-zA-Z]/' ./gateway/gateway-controller/pkg/transform/restapi.go | head -120
echo ""
echo "=== restapi.go clusterkey/APILevel/sanitize/ClusterKey/EnvoyClusterName references ==="
grep -n "clusterkey\|APILevel\|APILevelName\|ClusterKey\|EnvoyClusterName\|sanitize" ./gateway/gateway-controller/pkg/transform/restapi.go | head -40
echo ""
echo "=== ClusterKey and EnvoyClusterName struct definitions ==="
find ./gateway -name "*.go" | xargs grep -n "ClusterKey\b\|EnvoyClusterName\b" 2>/dev/null | head -50
echo ""
echo "=== translator_test.go - new test functions ==="
awk '/func TestResolveUpstreamCluster/,/^func [A-Z]/' ./gateway/gateway-controller/pkg/xds/translator_test.go | head -200
echo ""
echo "=== restapi_test.go - new APILevel test functions ==="
awk '/func TestRestAPITransformer_APILevel/,/^}$/' ./gateway/gateway-controller/pkg/transform/restapi_test.go | head -200
echo ""
echo "=== Integration test feature file ==="
cat ./gateway/it/features/api-level-url-stable.feature
echo ""
echo "=== Snapshot atomicity in translator.go ==="
grep -n "snapshot\|Snapshot\|SetSnapshot\|GenerateSnapshot\|atomic" ./gateway/gateway-controller/pkg/xds/translator.go | head -30
echo ""
echo "=== Old sanitize / URL-derived naming remnants ==="
find ./gateway -name "*.go" | xargs grep -n "cluster_http\|cluster_https\|sanitizeCluster\|sanitizeEnvoy" 2>/dev/null | head -20
echo ""
echo "=== data-model.md cluster section ==="
find ./gateway -name "data-model.md" | xargs grep -n -A 25 -i "cluster.nam\|cluster name\|cluster-nam" 2>/dev/null | head -80
echo ""
echo "=== cmd/controller/main.go comment ==="
find . -path "*/cmd/controller/main.go" | xargs grep -n "cluster\|xds\|transform" 2>/dev/null | head -20Length of output: 25751
Correctness & Design1. Both builders produce identical namesVerified. clusterName := clusterkey.APILevelName(upstreamName, apiID)
clusterKey := clusterkey.APILevelName(upstreamName, rdc.Metadata.UUID)Both call the same helper with 2. Main vs sandbox never collideVerified. 3.
|
| API count | Approximate collision probability |
|---|---|
| 10,000 | ~2.7 × 10⁻¹² (negligible) |
| 1,000,000 | ~2.7 × 10⁻⁸ (negligible) |
| 10⁹ | ~2.7 × 10⁻² (non-trivial) |
For most production deployments this is fine. However, the consequence of a collision is silent cross-wiring (two distinct APIs share one Envoy cluster), not fail-closed. This is the worst kind of failure mode for an API gateway — requests for API-A are silently routed to the backend of API-B.
Concrete suggestion: Increase the truncation to 12 bytes (96 bits, 24 hex chars). This pushes the collision probability to n² / 2⁹⁷, which is effectively impossible even at 10⁹ APIs. The known-answer test vectors would need updating, but the change is one-line in clusterkey.go:
// Before
return hex.EncodeToString(sum[:8])
// After (96-bit truncation)
return hex.EncodeToString(sum[:12])The cluster name grows from main_f9811b73ac5d1a8d (21 chars) to main_f9811b73ac5d1a8d3c2b1a09 (29 chars), well within Envoy's name limits.
R2 — Empty apiID cross-wiring
APILevel("") returns e3b0c44298fc1c14 (SHA-256 of the empty string, documented in the test). Every API with an empty apiID resolves to main_e3b0c44298fc1c14 / sandbox_e3b0c44298fc1c14, silently sharing a cluster. The test documents that emptiness is enforced upstream at deploy time, but there is no defensive guard in APILevel or APILevelName itself. If the upstream validation is bypassed or misconfigured, the failure mode is again silent cross-wiring.
Concrete suggestion: Add a guard at the top of APILevel:
func APILevel(apiID string) string {
if apiID == "" {
// Caller contract violation: enforce non-emptiness here as a last resort.
panic("clusterkey: apiID must not be empty")
}
sum := sha256.Sum256([]byte(apiID))
return hex.EncodeToString(sum[:8])
}A panic here is appropriate: an empty apiID is a programming error (not a runtime condition), and silently producing a shared cluster is more dangerous than a crash.
R3 — First-redeploy migration window
On the next configuration push after upgrading, existing clusters are renamed from cluster_http_<host> to main_<hash>. CDS and RDS are separate xDS watches: in the worst case, Envoy ACKs the new CDS snapshot (which removes the old cluster name and adds the new one) before the RDS update arrives with the repointed routes. During that gap, active routes still reference cluster_http_<host>, which no longer exists → 503 NoRoute.
This hazard is inherent to the xDS protocol and is not new to this PR. After the first redeploy the identity-stable naming eliminates this window entirely for URL edits, which is the whole point. However, the migration is worth calling out explicitly in operator runbooks, not just in PR description prose.
R4 — Observability: stat label loses hostname
The upstream_cluster Envoy stat changes from cluster_http_api.example.com_8080 to main_f9811b73ac5d1a8d. Dashboards keyed on the human-readable host lose their correlation to the actual backend. The PR description calls this out, but there is no compensating mechanism in the xDS config (e.g., cluster metadata or typed per-filter config) that would let Prometheus recording rules or Envoy access-log formatters recover the original hostname.
Concrete suggestion: Populate the cluster's metadata.filter_metadata["envoy.lb"] or a custom namespace with api_id, api_name, and env fields. This costs nothing at runtime and lets operators add metric labels without re-parsing the hash.
R5 — Cluster count increase at scale
Two APIs sharing a backend URL now each own their cluster. For a deployment with many APIs sharing a small number of backends (common in fan-out API gateway topologies), this multiplies the cluster count by a factor proportional to the number of APIs per shared backend. Envoy itself has no hard cluster limit but memory and convergence time grow. Worth monitoring in large multi-tenant deployments.
Edge Cases
E1 — API with only sandbox upstream (no main)
No test covers the case where an API is deployed with a sandbox vhost but no main upstream URL. The transformer path should skip registering a main_<hash> cluster, but any route on the main vhost that still references ClusterKey: "main_<hash>" would point at a cluster not in UpstreamClusters → policy engine 503 NoRoute. This case needs a unit test.
E2 — Main and sandbox pointing at the same URL
Correctly handled. The new naming gives them distinct clusters (main_<hash> ≠ sandbox_<hash>). The integration test does not exercise this, but it is a latent-collision fix that the unit tests cover.
Test Coverage Assessment
| Area | Coverage | Gap |
|---|---|---|
| Naming algorithm | TestAPILevel with known-answer vectors and TestAPILevelName — strong |
None |
| Both builders agree | Structural (shared helper) | No explicit cross-module assertion test |
| URL stability | TestRestAPITransformer_APILevelURLStableAcrossURLEdit |
Translator-side equivalent not shown |
| Old names absent | Integration test (cluster_http_ / cluster_https_ absence checked) |
Translator unit test does not assert absence of old name |
| DefaultCluster matches real cluster | TestRestAPITransformer_APILevelDefaultClusterMatchesRealCluster |
Good |
| Main/sandbox never collide | TestResolveUpstreamCluster_MainSandboxNeverCollide exists |
Body not verified — confirm it asserts prefix-only difference |
| Empty apiID | Documented in TestAPILevel ("empty input is deterministic") |
No test asserting deploy-time enforcement rejects it |
| Only-sandbox API | Not present | Missing |
| Integration: host change | api-level-url-stable.feature — captures cluster names before/after, asserts unchanged |
Strong |
Suggested additional test:
func TestAPILevel_OnlySandboxAPI(t *testing.T) {
// API deployed with sandbox upstream only; no main upstream registered.
// The main_<hash> cluster must NOT appear in UpstreamClusters.
transformer := NewRestAPITransformer(...)
cfg := makeRestAPIWithSandboxOnly(...)
rdc, err := transformer.Transform(cfg)
require.NoError(t, err)
expectedMain := "main_" + clusterkey.APILevel(cfg.UUID)
_, exists := rdc.UpstreamClusters[expectedMain]
assert.False(t, exists, "main cluster must not be registered when no main upstream is configured")
}Summary of Actionable Items
| # | Severity | Item |
|---|---|---|
| 1 | Medium | Increase hash truncation from 8 to 12 bytes to eliminate collision risk at scale |
| 2 | Medium | Add empty-apiID guard (panic or error) in clusterkey.APILevel |
| 3 | Low | Add cluster metadata fields (api_id, env) for observability continuity |
| 4 | Low | Add unit test: API-with-only-sandbox produces no main_<hash> cluster |
| 5 | Low | Add translator-side unit test asserting old cluster_http_ / cluster_https_ names are absent |
| 6 | Informational | Document the first-redeploy 503 window in operator runbooks |
The core design — identity in, URL out, one shared helper for both builders, ClusterKey == EnvoyClusterName — is sound and correctly implemented. The main residual risk is the 64-bit collision producing silent cross-wiring and the unguarded empty-apiID case.
840cb13 to
42cbb91
Compare
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== clusterkey.go - hash truncation ==="
find ./gateway -path "*/utils/clusterkey*" -name "*.go" | xargs cat
echo ""
echo "=== clusterkey_test.go - known-answer vectors ==="
find ./gateway -path "*/utils/clusterkey*" -name "*_test.go" | xargs cat
echo ""
echo "=== translator_test.go - TestResolveUpstreamCluster_NameNotURLDerived ==="
find ./gateway -name "*_test.go" -path "*/xds/*" | xargs grep -n "TestResolveUpstreamCluster_NameNotURLDerived\|NameNotURLDerived" -A 30 2>/dev/null || true
echo ""
echo "=== restapi_test.go - TestRestAPITransformer_APILevelMainOnlyHasNoSandboxCluster ==="
find ./gateway -name "*_test.go" -path "*/transform/*" | xargs grep -n "MainOnlyHasNoSandboxCluster\|APILevelMainOnly" -A 40 2>/dev/null || true
echo ""
echo "=== Confirm fragment length in clusterkey.go ==="
grep -n "sum\[:" $(find ./gateway -path "*/utils/clusterkey*" -name "*.go" ! -name "*_test.go") 2>/dev/null || true
echo ""
echo "=== Any remaining 8-byte / sum[:8] references ==="
find ./gateway -name "*.go" | xargs grep -n "sum\[:8\]\|sum\[:12\]" 2>/dev/null || trueLength of output: 13837
Item 1 — 96-bit truncation ✅
Item 4 — Main-only API, no sandbox cluster/route ✅
The direction is correct — Main mandatory, Sandbox optional — and the test precisely targets the 503 NoRoute failure mode. Item 5 — Old URL-derived name absent ✅
This is a complete regression lock. Item 2 — Empty
|
| # | Item | Status |
|---|---|---|
| 1 | 96-bit truncation | ✅ Resolved |
| 2 | Empty-apiID guard | ✅ Accepted as-is |
| 3 | Cluster metadata for observability | Deferred |
| 4 | Main-only no-sandbox test | ✅ Resolved |
| 5 | Old name absence test | ✅ Resolved |
| 6 | First-redeploy runbook note | Deferred |
Items 3 and 6 are noted as follow-up work. No open blockers in the addressed items.
… upstreams Name API-level Envoy clusters from sha256(apiID) with a main_/sandbox_ prefix instead of the backend URL, so a URL edit updates endpoints on the same named cluster instead of renaming it. Both xDS builders share one clusterkey helper.
42cbb91 to
9b9ee98
Compare
…on cases Two new api-level-url-stable scenarios: main and sandbox on the same backend host get separate identity-named clusters, and two APIs sharing a backend host each route under their own cluster. Both fail under the old URL-derived naming.
|
@coderabbitai please do a full review of this PR in its current state (head has changed since your last pass, it now also includes two new integration scenarios in api-level-url-stable.feature and minor doc fixes). In particular, review all of the code comments across the PR for verbosity and redundancy. Several doc/inline comments (for example the doc comment on APILevel in pkg/utils/clusterkey/clusterkey.go, which is about 13 lines over a 2-line function) restate design rationale that is already in the PR description. Compared to the commenting conventions in neighbouring gateway-controller files, are any comments too long or redundant? Where you find some, suggest a concise replacement. Also please cover, as a normal full review:
Base all style judgments on the repo's actual conventions, not a generic standard. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== clusterkey.go full content ==="
find ./gateway -path "*/utils/clusterkey*" -name "*.go" ! -name "*_test.go" | xargs cat
echo ""
echo "=== clusterkey_test.go full content ==="
find ./gateway -path "*/utils/clusterkey*" -name "*_test.go" | xargs cat
echo ""
echo "=== api-level-url-stable.feature full content ==="
find . -name "api-level-url-stable.feature" | xargs cat
echo ""
echo "=== Commenting conventions in neighboring files (pkg/utils) ==="
find ./gateway -path "*/utils/*.go" ! -name "*_test.go" ! -path "*/clusterkey*" | head -5 | xargs -I{} sh -c 'echo "--- {} ---"; head -80 {}'
echo ""
echo "=== Commenting conventions in pkg/xds/translator.go (nearby exported funcs) ==="
grep -n "^// \|^func " ./gateway/gateway-controller/pkg/xds/translator.go | head -60
echo ""
echo "=== Commenting conventions in pkg/transform/restapi.go (nearby exported funcs) ==="
grep -n "^// \|^func " ./gateway/gateway-controller/pkg/transform/restapi.go | head -60
echo ""
echo "=== translator.go resolveUpstreamCluster full ==="
awk '/func resolveUpstreamCluster/,/^func [a-zA-Z]/' ./gateway/gateway-controller/pkg/xds/translator.go | head -60
echo ""
echo "=== translator.go inline comments around resolveUpstreamCluster ==="
grep -n -B2 -A20 "func resolveUpstreamCluster" ./gateway/gateway-controller/pkg/xds/translator.go | head -60
echo ""
echo "=== restapi.go addUpstreamCluster full with comments ==="
grep -n -B2 -A80 "func addUpstreamCluster" ./gateway/gateway-controller/pkg/transform/restapi.go | head -100
echo ""
echo "=== cmd/controller/main.go comment about cluster naming ==="
find . -path "*/cmd/controller/main.go" | xargs grep -n -B5 -A10 "cluster\|xds\|transform\|sync" 2>/dev/null | head -60
echo ""
echo "=== translator_test.go - all new test cases full ==="
find ./gateway -name "*_test.go" -path "*/xds/*" | xargs grep -n "TestResolveUpstreamCluster" -A 50 2>/dev/null | head -300
echo ""
echo "=== restapi_test.go - all APILevel test cases full ==="
find ./gateway -name "*_test.go" -path "*/transform/*" | xargs grep -n "TestRestAPITransformer_APILevel" -A 60 2>/dev/null | head -300
echo ""
echo "=== data-model.md cluster section ==="
find . -name "data-model.md" | xargs grep -n -B5 -A40 -i "cluster" 2>/dev/null | head -120
echo ""
echo "=== Check PR diff summary via GitHub CLI ==="
gh pr view 2137 --repo wso2/api-platform --json files --jq '.files[].path' 2>/dev/null || echo "gh CLI not available"Length of output: 46691
Comment verbosity and redundancyThe repo convention in
|
| Area | Finding |
|---|---|
| Step-definition fidelity | I capture the Envoy cluster names prefixed "..." and should be unchanged are new step definitions not visible in this diff. If "unchanged" is implemented as simple string containment rather than exact set equality, the stability assertions in Scenarios 1 and 2 are weaker than intended. Worth verifying the step implementation uses set equality. |
| Independent-cluster deletion | No test asserts that deleting one of two APIs sharing a backend host leaves the other API's traffic unaffected. This is the key operational property of identity naming at scale. |
| Known-answer test coverage | clusterkey_test.go pins only two inputs ("api-1", "test-api"). A third vector for the empty-string SHA-256 is present but the more useful addition would be a vector for a realistic UUID-format apiID to confirm the function composes correctly with real inputs. |
Summary
The core logic is correct and the 96-bit truncation, shared helper, and ClusterKey == EnvoyClusterName constraints are all verified. The actionable items are:
- Shorten the
APILeveldoc comment to 3 lines (most impactful, clearly over the repo norm). - Shorten the
resolveUpstreamClusterblock comment to 3 lines. - Shorten the package comment to one line.
- Trim the
TestAPILevelfunction comment to remove the design-rationale sentence. - Scenario 5: Add an assertion for two distinct
main_<hash>cluster names and a post-delete routing check for API-A. - Verify the
should be unchangedstep definition uses exact set equality.
|
@coderabbitai addressed your review in d79e445:
Two points needed no code change: the "should be unchanged" step already uses exact set equality (strings.Join over sorted sets in steps_envoy_admin.go), and the "two distinct main_ count" assertion was intentionally skipped because a global /clusters count is unreliable (other deployed APIs also produce main_ clusters); the post-delete check covers that property robustly. Please do a final full review of this head, and as part of it verify one design question by reading the actual code across the repo (gateway-controller and platform-api): The API-level cluster name hashes the API's UUID (clusterkey.APILevel(apiID), where apiID = cfg.UUID in xds and rdc.Metadata.UUID in transform). A reviewer noted that "users can give api ids," so please confirm precisely:
Please cite file:line for the assignment, validation, and uniqueness constraints, give a clear verdict on the three risks (empty apiID, duplicate UUID, UUID changing across a URL edit), and flag anything else you find in this final pass. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== Verify addressed items: clusterkey.go doc comments (current head) ==="
find ./gateway -path "*/utils/clusterkey*" -name "*.go" ! -name "*_test.go" | xargs cat
echo ""
echo "=== clusterkey_test.go (current head, check UUIDv7 vector) ==="
find ./gateway -path "*/utils/clusterkey*" -name "*_test.go" | xargs cat
echo ""
echo "=== resolveUpstreamCluster comment (current head) ==="
grep -n -B2 -A30 "func resolveUpstreamCluster\|resolveUpstreamCluster" ./gateway/gateway-controller/pkg/xds/translator.go | head -80
echo ""
echo "=== TestAPILevel comment (current head) ==="
find ./gateway -name "*_test.go" -path "*/clusterkey*" | xargs grep -n -B2 -A8 "func TestAPILevel" 2>/dev/null || true
echo ""
echo "=== api-level-url-stable.feature full content (current head) ==="
find . -name "api-level-url-stable.feature" | xargs cat
echo ""
echo "=== steps_envoy_admin.go - unchanged/set equality step ==="
find . -name "steps_envoy_admin.go" | xargs cat 2>/dev/null || find . -name "*envoy_admin*" | xargs cat 2>/dev/null
echo ""
echo "=== Where cfg.UUID is set in gateway-controller xds path ==="
rg -rn "\.UUID\b" --include="*.go" ./gateway/gateway-controller/pkg/xds/ -B3 -A3 | head -80
echo ""
echo "=== RestAPI/RDC Metadata.UUID assignment ==="
rg -rn "Metadata\.UUID\|\.UUID\s*=" --include="*.go" ./gateway/ -B3 -A3 | head -80
echo ""
echo "=== artifact-id annotation handling ==="
rg -rn "artifact-id\|artifactId\|artifact_id" --include="*.go" ./gateway/ -B3 -A5 | head -80
echo ""
echo "=== How the API deploy event / config is constructed (gateway-controller reconciler) ==="
find ./gateway -name "*.go" | xargs grep -ln "UUID\|Deploy\|Reconcile\|artifact" 2>/dev/null | grep -i "reconcil\|controller\|deploy\|handler" | head -10 | xargs -I{} sh -c 'echo "--- {} ---"; grep -n "UUID" {} | head -20'
echo ""
echo "=== Platform-api: where API UUID is assigned on create ==="
find . -path "*/platform-api*" -name "*.go" 2>/dev/null | xargs grep -ln "UUID\|uuid\.New\|uuid\.Generate\|google/uuid" 2>/dev/null | head -10 | xargs -I{} sh -c 'echo "--- {} ---"; grep -n "UUID\|uuid\.New\|uuid\.Generate" {} | head -20'Length of output: 11027 🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== Top-level directory structure ==="
ls -1
echo ""
echo "=== Platform-api directory ==="
find . -maxdepth 3 -type d -name "platform-api" 2>/dev/null
find . -maxdepth 3 -type d | grep -i "platform.api\|platformapi" 2>/dev/null | head -10
echo ""
echo "=== All Go files with UUID assignment ==="
rg -rn "uuid\.New\|uuid\.Generate\|NewRandom\|NewString\|google/uuid\|satori/go.uuid" --include="*.go" -B2 -A5 | head -120
echo ""
echo "=== API create handler / service ==="
find . -name "*.go" | xargs grep -ln "CreateAPI\|Create.*API\|api.*create\|POST.*api" 2>/dev/null | head -10 | xargs -I{} sh -c 'echo "--- {} ---"; grep -n "UUID\|uuid" {} | head -20'
echo ""
echo "=== Gateway-controller: how it picks up API UUID from CRD/annotation ==="
find ./gateway -name "*.go" | xargs grep -ln "artifact-id\|ArtifactID\|artifactID\|Annotation\|annotation" 2>/dev/null | head -10 | xargs -I{} sh -c 'echo "--- {} ---"; grep -n "UUID\|artifact\|Annotation" {} | head -30'
echo ""
echo "=== RestAPIConfig / APIConfig struct UUID field ==="
rg -rn "type.*Config\|UUID\s+string\|Id\s+string\|ApiId\s+string" --include="*.go" ./gateway/ -A2 | head -80
echo ""
echo "=== pkg/transform/restapi.go - how Metadata.UUID is populated ==="
grep -n "UUID\|Metadata\|cfg\." ./gateway/gateway-controller/pkg/transform/restapi.go | head -40
echo ""
echo "=== pkg/xds/translator.go - how cfg.UUID is populated ==="
grep -n "UUID\|cfg\." ./gateway/gateway-controller/pkg/xds/translator.go | head -40
echo ""
echo "=== RestAPIConfig struct definition ==="
rg -rn "type RestAPIConfig\|type APIConfig\|type RestAPI\b" --include="*.go" ./gateway/ -A 30 | head -80
echo ""
echo "=== Gateway Kubernetes CRD / reconciler UUID source ==="
find ./gateway -name "*.go" | xargs grep -ln "Reconcile\|Reconciler" 2>/dev/null | head -5 | xargs -I{} sh -c 'echo "--- {} ---"; cat {} | head -150'Length of output: 3213 🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== Locate API structs ==="
rg -rn "UUID\s" --include="*.go" ./gateway/ -B5 | head -120
echo ""
echo "=== Locate the struct that has UUID + upstream config ==="
rg -rn "type.*Spec\|type.*Config\|type.*API\b\|type.*Upstream" --include="*.go" ./gateway/ -A15 | head -200
echo ""
echo "=== How cfg is constructed / populated in translator ==="
grep -n "cfg\b" ./gateway/gateway-controller/pkg/xds/translator.go | head -60
echo ""
echo "=== How rdc is constructed in transform ==="
grep -n "rdc\b\|RDC\b\|RuntimeDeploy" ./gateway/gateway-controller/pkg/transform/restapi.go | head -60
echo ""
echo "=== Input type for translator resolveUpstreamCluster and transform addUpstreamCluster ==="
grep -n "func.*Translate\|func.*Transform\|func.*Build\|func.*Process" ./gateway/gateway-controller/pkg/xds/translator.go | head -20
grep -n "func.*Translate\|func.*Transform\|func.*Build\|func.*Process" ./gateway/gateway-controller/pkg/transform/restapi.go | head -20
echo ""
echo "=== Find API CRD spec type definition with UUID ==="
find ./gateway -name "*.go" -path "*/api*" | xargs grep -ln "UUID\|uuid" 2>/dev/null | head -10 | xargs -I{} sh -c 'echo "--- {} ---"; cat {}'
echo ""
echo "=== Find platform-api or control-plane code ==="
find . -maxdepth 5 -name "*.go" | xargs grep -ln "UUID\|uuid\.New" 2>/dev/null | grep -v "gateway-controller\|vendor\|_test" | head -10 | xargs -I{} sh -c 'echo "--- {} ---"; grep -n "UUID\|uuid" {} | head -20'
echo ""
echo "=== k8s-watcher or event consumer ==="
find ./gateway -name "*.go" | xargs grep -ln "Watch\|Event\|Informer\|Queue" 2>/dev/null | head -5 | xargs -I{} sh -c 'echo "--- {} ---"; grep -n "UUID\|uuid" {} | head -20'
echo ""
echo "=== What fields does the artifact annotation carry ==="
rg -rn "artifact-id\|wso2.com/artifact" --include="*.go" ./gateway/ -B3 -A10 | head -80
rg -rn "artifact-id\|wso2.com/artifact" --include="*.yaml" ./gateway/ -B3 -A10 | head -80Length of output: 2670 🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== Full directory tree top 3 levels ==="
find . -maxdepth 3 -type d | sort | head -100
echo ""
echo "=== All Go files (non-vendor, non-test) containing uuid ==="
find . -name "*.go" ! -path "*/vendor/*" ! -path "*_test.go" | xargs grep -li "uuid" 2>/dev/null | head -30
echo ""
echo "=== platform-api or apk directory ==="
find . -maxdepth 2 -type d | grep -v "\.git\|node_modules" | head -80
echo ""
echo "=== RestAPIConfig or RestAPI type (look in all Go files) ==="
find . -name "*.go" ! -path "*/vendor/*" | xargs grep -ln "RestAPIConfig\|type.*RestAPI\b" 2>/dev/null | head -10 | xargs -I{} sh -c 'echo "=== {} ==="; cat {}'
echo ""
echo "=== APIDeployEvent or DeployEvent type ==="
find . -name "*.go" ! -path "*/vendor/*" | xargs grep -ln "DeployEvent\|Deploy.*Event\|Event.*Deploy" 2>/dev/null | head -10 | xargs -I{} sh -c 'echo "=== {} ==="; grep -n "UUID\|Deploy\|Event" {} | head -30'
echo ""
echo "=== uuid.New or uuid generation in non-test Go files ==="
find . -name "*.go" ! -path "*/vendor/*" ! -name "*_test.go" | xargs grep -n "uuid\.New\|uuid\.Generate\|uuid\.Must\|NewRandom\|NewString" 2>/dev/null | head -40Length of output: 5539 🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== platform-api API create/UUID assignment ==="
find ./platform-api/src -name "*.go" | xargs grep -ln "uuid\|UUID" 2>/dev/null | head -15 | xargs -I{} sh -c 'echo "=== {} ==="; grep -n "uuid\|UUID" {} | head -30'
echo ""
echo "=== platform-api API model/struct ==="
find ./platform-api/src -name "*.go" | xargs grep -ln "type.*API\b\|RestAPI\|ApiId\|api_id" 2>/dev/null | head -10 | xargs -I{} sh -c 'echo "=== {} ==="; grep -n "UUID\|uuid\|ApiId\|api_id\|type.*API\b" {} | head -20'
echo ""
echo "=== gateway-controller handlers - how UUID is used ==="
find ./gateway/gateway-controller/pkg/api/handlers -name "*.go" | xargs grep -n "UUID\|uuid" 2>/dev/null | head -40
echo ""
echo "=== gateway-controller controlplane events - UUID fields ==="
cat ./gateway/gateway-controller/pkg/controlplane/events.go 2>/dev/null || echo "not found"
echo ""
echo "=== gateway-controller controlplane client - how events are received ==="
cat ./gateway/gateway-controller/pkg/controlplane/client.go 2>/dev/null | head -150
echo ""
echo "=== gateway-controller: where cfg struct is defined ==="
find ./gateway/gateway-controller -name "*.go" | xargs grep -ln "type.*RestAPIConfig\|type.*APIConfig\|type.*APISpec" 2>/dev/null | head -5 | xargs -I{} sh -c 'echo "=== {} ==="; cat {}'
echo ""
echo "=== gateway-controller: handlers.go UUID ==="
find ./gateway/gateway-controller/pkg/api/handlers -name "handlers.go" | xargs cat 2>/dev/null | head -150
echo ""
echo "=== artifact-id annotation in gateway YAML ==="
find . -name "*.yaml" ! -path "*/.git/*" | xargs grep -ln "artifact-id\|api-platform.wso2.com" 2>/dev/null | head -10 | xargs -I{} sh -c 'echo "=== {} ==="; grep -n "artifact-id\|api-platform.wso2" {} | head -10'
echo ""
echo "=== gateway-controller: annotation-based UUID reading ==="
find ./gateway/gateway-controller -name "*.go" | xargs grep -n "artifact-id\|ArtifactID\|api-platform.wso2.com" 2>/dev/null | head -30Length of output: 9465 🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== platform-api src structure ==="
find ./platform-api/src -maxdepth 4 -type f -name "*.go" | sort | head -50
echo ""
echo "=== platform-api api directory ==="
find ./platform-api/src/api -name "*.go" | xargs -I{} sh -c 'echo "=== {} ==="; cat {}' 2>/dev/null | head -400
echo ""
echo "=== gateway-controller events.go ==="
cat ./gateway/gateway-controller/pkg/controlplane/events.go 2>/dev/null
echo ""
echo "=== helpers.go in handlers ==="
cat ./gateway/gateway-controller/pkg/api/handlers/helpers.go 2>/dev/null | head -150
echo ""
echo "=== gateway-controller pkg/api/handlers/handlers.go ==="
cat ./gateway/gateway-controller/pkg/api/handlers/handlers.go 2>/dev/null | head -200Length of output: 20217 🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== platform-api dto/api.go ==="
cat ./platform-api/src/internal/dto/api.go
echo ""
echo "=== platform-api dto/gateway_event.go ==="
cat ./platform-api/src/internal/dto/gateway_event.go
echo ""
echo "=== platform-api handler/api.go - CreateAPI and UUID assignment ==="
grep -n "uuid\|UUID\|Id\b\|\.Id\b\|uuid\.New\|uuid\.Generate\|uuid\.Must" ./platform-api/src/internal/handler/api.go | head -50
echo ""
echo "=== platform-api - service layer for API creation ==="
find ./platform-api/src/internal -name "*.go" -path "*service*" | head -10 | xargs -I{} sh -c 'echo "=== {} ==="; grep -n "uuid\|UUID\|\.Id\b\|uuid\.New" {} | head -20'
find ./platform-api/src/internal -name "*api*service*" -o -name "*service*api*" | head -5 | xargs -I{} sh -c 'echo "=== {} ==="; cat {}' 2>/dev/null | head -200
echo ""
echo "=== gateway-controller/pkg/controlplane/events.go ==="
cat ./gateway/gateway-controller/pkg/controlplane/events.go
echo ""
echo "=== gateway-controller/pkg/controlplane/client.go UUID fields ==="
grep -n "UUID\|uuid\|ApiId\|api_id\|Id\b" ./gateway/gateway-controller/pkg/controlplane/client.go | head -40
echo ""
echo "=== gateway-controller runtime_bootstrap.go UUID ==="
grep -n "UUID\|uuid" ./gateway/gateway-controller/cmd/controller/runtime_bootstrap.go | head -20
echo ""
echo "=== gateway-controller handlers.go - CreateAPI / updateAPI handlers UUID ==="
cat ./gateway/gateway-controller/pkg/api/handlers/handlers.go | head -300
echo ""
echo "=== gateway-controller helpers.go - UUID extraction from requests ==="
cat ./gateway/gateway-controller/pkg/api/handlers/helpers.go
echo ""
echo "=== Find artifact-id annotation key definition ==="
find . -name "*.go" ! -path "*/.git/*" | xargs grep -n "artifact-id\|ArtifactId\b\|artifactId\b\|api-platform.wso2.com" 2>/dev/null | grep -v "Binary\|vendor" | head -40
echo ""
echo "=== GatewayArtifact / artifact UUID field ==="
grep -n "ArtifactId\|artifact.*id\|Id.*artifact\|UUID\|uuid" ./platform-api/src/api/generated.go | grep -i "artifact\|api.id\|apiId" | head -20Length of output: 16370 🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== Find API service layer ==="
find ./platform-api/src/internal -type d | sort
echo ""
echo "=== Find service files ==="
find ./platform-api/src/internal -name "*.go" | grep -v "_test\|handler\|handler" | sort | head -40
echo ""
echo "=== Find uuid.New / uuid generation anywhere in platform-api ==="
find ./platform-api/src -name "*.go" | xargs grep -n "uuid\.New\|uuid\.Must\|google/uuid\|NewRandom\|NewString\|NewV4\|NewV7\|UUIDv7\|uuidv7" 2>/dev/null | head -40
echo ""
echo "=== gateway-controller events.go ==="
cat ./gateway/gateway-controller/pkg/controlplane/events.go 2>/dev/null
echo ""
echo "=== gateway-controller client.go (search for where UUID is read from event) ==="
grep -n "UUID\|uuid\|\.Id\b\|ApiId\|api_id\|Artifact" ./gateway/gateway-controller/pkg/controlplane/client.go | head -40
echo ""
echo "=== gateway-controller handlers: how they set cfg.UUID or pass apiID ==="
grep -n "UUID\|uuid\|\.Id\b\|ApiId\|cfg\." ./gateway/gateway-controller/pkg/api/handlers/handlers.go | head -40
echo ""
echo "=== Find cfg.UUID assignment in translator ==="
grep -n "cfg\.UUID\|\.UUID\b" ./gateway/gateway-controller/pkg/xds/translator.go | head -20
echo ""
echo "=== Find rdc.Metadata.UUID assignment ==="
grep -n "Metadata\.UUID\|rdc\.Metadata\|\.UUID\b" ./gateway/gateway-controller/pkg/transform/restapi.go | head -20
echo ""
echo "=== RestAPIConfig struct definition in gateway-controller ==="
find ./gateway/gateway-controller -name "*.go" | xargs grep -ln "type RestAPIConfig\|type APISpec\|type APIConfig" 2>/dev/null | xargs -I{} sh -c 'echo "=== {} ==="; grep -n -A20 "type RestAPIConfig\|type APISpec\|type APIConfig" {}'
echo ""
echo "=== artifact-id annotation - gateway YAML fields ==="
find . -name "*.go" ! -path "*/.git/*" | xargs grep -n "artifact-id\|wso2.com/artifact\|ArtifactID\b\|artifactID\b" 2>/dev/null | head -30
echo ""
echo "=== gateway-controller pkg/api/admin or management - REST API struct with UUID ==="
find ./gateway/gateway-controller/pkg/api -name "*.go" | xargs grep -ln "UUID\|type.*RestAPI\|type.*API\b" 2>/dev/null | head -5 | xargs -I{} sh -c 'echo "=== {} ==="; grep -n "UUID\|type.*RestAPI\|type.*API\b" {} | head -20'Length of output: 34021 🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== platform-api service/api.go - UUID assignment ==="
grep -n "uuid\|UUID\|\.Id\b\|ID\b" ./platform-api/src/internal/service/api.go | head -40
echo ""
echo "=== platform-api model/api.go - API model with UUID ==="
cat ./platform-api/src/internal/model/api.go 2>/dev/null | head -80
echo ""
echo "=== platform-api service/api.go - CreateAPI function ==="
grep -n -A40 "func.*Create" ./platform-api/src/internal/service/api.go | head -80
echo ""
echo "=== gateway-controller client.go - how APIDeployedEvent.APIID becomes cfg.UUID ==="
grep -n -B5 -A20 "APIDeployedEvent\|APIID\|api_id\|apiId\|cfg\.UUID\b" ./gateway/gateway-controller/pkg/controlplane/client.go | head -120
echo ""
echo "=== gateway-controller client.go - how cfg (RestAPIConfig) is stored and its UUID field ==="
grep -n "RestAPIConfig\|APIConfig\|cfg\.UUID\|\.UUID\s*=" ./gateway/gateway-controller/pkg/controlplane/client.go | head -30
echo ""
echo "=== gateway-controller: where cfg struct is populated from event ==="
find ./gateway/gateway-controller -name "*.go" | xargs grep -ln "RestAPIConfig\|cfg\.UUID" 2>/dev/null | head -5 | xargs -I{} sh -c 'echo "=== {} ==="; grep -n "UUID\|RestAPIConfig\|cfg\." {} | head -30'
echo ""
echo "=== gateway-controller - direct YAML path (artifact-id annotation) ==="
find ./gateway/gateway-controller -name "*.go" | xargs grep -n "artifact-id\|wso2.com/artifact\|gateway.api-platform" 2>/dev/null | head -20
echo ""
echo "=== platform-api service/gateway_events.go - how APIID is populated in deploy event ==="
grep -n -B3 -A15 "APIID\|ApiId\|apiId\|UUID\|uuid" ./platform-api/src/internal/service/gateway_events.go | head -100
echo ""
echo "=== platform-api internal model/api.go ==="
cat ./platform-api/src/internal/model/api.go 2>/dev/null
echo ""
echo "=== platform-api repository for API - where UUID is generated ==="
find ./platform-api/src/internal/repository -name "*.go" | xargs grep -n "UUID\|uuid\.New" 2>/dev/null | head -30Length of output: 4861
Addressed items — verified
One residual: both the package comment and Design question: UUID provenance and controllabilityQ1 — Where is the UUID assigned, and can a client supply it?File:
Verdict: A client can supply a handle/slug via Q2 — Can an empty, duplicate, or user-controlled UUID reach
|
| Risk | Finding |
|---|---|
| Empty UUID | No path. uuid.New() always returns a 36-character non-empty string. |
| Duplicate UUID | No path. uuid.New() is crypto-random v4; birthday collision probability is negligible. |
| User-controlled UUID | No path. req.Id maps to the handle/slug only. The UUID is never taken from client input. |
| UUID changes on URL edit | No. UpdateAPI reads the existing UUID from the database; the URL edit changes only upstream config. |
Q3 — Is omitting an in-function guard in clusterkey.APILevel safe?
Yes. There is no supported path — through the platform-api control plane, the gateway-controller deploy event, or the gateway-controller REST handlers — by which an empty or user-controlled UUID reaches clusterkey.APILevel. The empty-input test in clusterkey_test.go is correctly positioned as a documentation of deterministic behaviour, not a guard for a real runtime condition.
Remaining comment observation (very minor)
The APILevelName doc (clusterkey.go Line 36–37) ends with "so both xDS builders name clusters identically." The package comment (Line 19–20) already says "shared by both xDS builders so they name clusters identically." The two sentences are equivalent. If you want to eliminate the duplication, the APILevelName doc can simply read:
// APILevelName joins the env prefix ("main"/"sandbox") to the APILevel fragment
// to form the full Envoy cluster name.Not a blocker; noting for completeness.
Overall verdict
The core logic, naming contract, and test coverage are correct and complete. All substantive review items from previous passes are addressed. The UUID provenance is fully server-controlled with no client-injection path. The PR is ready to merge.
…rage Shorten the doc comments on clusterkey.APILevel, the package, and resolveUpstreamCluster to the repo's terse style. Add a UUID-shaped known-answer vector and an integration check that deleting one API does not disturb another sharing the same backend host.
d79e445 to
10548bd
Compare
|
@coderabbitai one final verification pass on head 10548bd, please, focused on production safety for EXISTING deployments. Read the actual code (the snapshot-building and startup config-load paths, the xDS translator, and pkg/policyxds). This PR changes API-level Envoy cluster names from URL-derived (cluster_) to identity-derived (main / sandbox_, hash = sha256(apiID)[:12]). Please confirm whether rolling this out can crash, disrupt, or drop traffic for already-running APIs, and answer concretely:
Give a clear verdict: does deploying this break or disrupt existing running APIs, and is anything required of operators beyond updating dashboards keyed on the old cluster names? Flag anything else you find in this final pass. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== pkg/xds/translator.go - snapshot building, SetSnapshot, GenerateSnapshot ==="
grep -n "snapshot\|Snapshot\|SetSnapshot\|GenerateSnapshot\|atomic\|cache\.New\|consistent" ./gateway/gateway-controller/pkg/xds/translator.go | head -60
echo ""
echo "=== pkg/xds/translator.go - full snapshot assembly function ==="
awk '/func.*[Ss]napshot\|func.*[Tt]ranslat\|func.*[Gg]enerate/,/^func [a-zA-Z]/' ./gateway/gateway-controller/pkg/xds/translator.go | head -200
echo ""
echo "=== How xDS resources are grouped into a snapshot (CLS+RDS together?) ==="
grep -n "Clusters\|Routes\|Listeners\|Endpoints\|ResourcesFor\|NewSnapshot\|cache\.Snapshot\|snapshot\.Resources" ./gateway/gateway-controller/pkg/xds/translator.go | head -60
echo ""
echo "=== pkg/policyxds - all files ==="
find ./gateway -path "*/policyxds*" -name "*.go" ! -name "*_test.go" | sort | xargs -I{} sh -c 'echo "=== {} ==="; cat {}'
echo ""
echo "=== cmd/controller/main.go - startup/bootstrap path ==="
cat ./gateway/gateway-controller/cmd/controller/main.go | head -300
echo ""
echo "=== Startup: how existing configs are loaded on boot ==="
find ./gateway/gateway-controller -name "*.go" | xargs grep -ln "bootstrap\|Bootstrap\|startup\|startup\|loadConfig\|LoadConfig\|loadAll\|reconcileAll\|reconcileExisting\|InitCache\|initCache" 2>/dev/null | head -10 | xargs -I{} sh -c 'echo "=== {} ==="; cat {} | head -150'
echo ""
echo "=== How existing API configs are replayed into cache on startup ==="
grep -n "for\|range\|existing\|list\|List\|cache\|Cache\|init\|Init" ./gateway/gateway-controller/pkg/controlplane/client.go | head -60
echo ""
echo "=== client.go startup config load full ==="
cat ./gateway/gateway-controller/pkg/controlplane/client.go | head -200
echo ""
echo "=== Check if cluster name is stored in DB or any config store ==="
find ./platform-api/src -name "*.go" | xargs grep -n "cluster_\|ClusterName\|EnvoyCluster\|cluster_name" 2>/dev/null | head -30
echo ""
echo "=== Check if cluster name is stored in k8s CRD or annotation ==="
find ./gateway -name "*.go" | xargs grep -n "cluster_\|ClusterName\|EnvoyCluster\|cluster_name" 2>/dev/null | grep -v "_test\|clusterkey\|APILevel\|sanitize" | head -30
echo ""
echo "=== Check for any SQL files or schema files ==="
find . -name "*.sql" -o -name "*schema*" -o -name "*migration*" 2>/dev/null | grep -v ".git" | head -20Length of output: 145966 🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== xDS cache type - consistent vs naive ==="
grep -rn "consistent\|Consistent\|snapshot_cache\|NewSnapshotCache\|NewSnapshotCacheWithOptions\|cache\.New\b\|NewLinearCache" ./gateway/gateway-controller/ --include="*.go" | head -30
echo ""
echo "=== xDS snapshot construction - what resources go into one snapshot call ==="
grep -n "cache\.NewSnapshot\|NewSnapshot\|snapshot\.New\|MakeSnapshot" ./gateway/gateway-controller/pkg/xds/translator.go | head -20
grep -rn "cache\.NewSnapshot\|NewSnapshot\|snapshot\.New\|MakeSnapshot" ./gateway/gateway-controller/ --include="*.go" | head -30
echo ""
echo "=== pkg/xds/translator.go SetSnapshot call with all resource types ==="
grep -n -B5 -A30 "SetSnapshot\|setSnapshot\|cache\.SetSnapshot" ./gateway/gateway-controller/pkg/xds/translator.go | head -100
echo ""
echo "=== policyxds snapshot.go - full content ==="
cat ./gateway/gateway-controller/pkg/policyxds/snapshot.go
echo ""
echo "=== policyxds - SetSnapshot with all resource types ==="
grep -n -B5 -A30 "SetSnapshot\|NewSnapshot\|snapshot\." ./gateway/gateway-controller/pkg/policyxds/snapshot.go | head -100
echo ""
echo "=== Does policyxds snapshot include cluster names? ==="
grep -n "Cluster\|cluster\|ClusterKey\|EnvoyClusterName" ./gateway/gateway-controller/pkg/policyxds/snapshot.go | head -30
echo ""
echo "=== startup bootstrap: how does the controller load existing API state ==="
cat ./gateway/gateway-controller/cmd/controller/runtime_bootstrap.go
echo ""
echo "=== startup: initial xDS cache population ==="
grep -rn "for.*range\|for.*apis\|for.*all\|existing\|loadExisting\|initSnapshot\|bootstrap\|Bootstrap" ./gateway/gateway-controller/pkg/xds/ --include="*.go" -A5 | head -100
echo ""
echo "=== startup/init: how many snapshots are created per boot (one per API or one consolidated?) ==="
grep -rn "SetSnapshot\|snapshot" ./gateway/gateway-controller/pkg/xds/translator.go | grep -i "set\|create\|init\|boot" | head -30
echo ""
echo "=== Check xDS version in go.mod - solo-io/go-control-plane or envoyproxy? ==="
grep "go-control-plane\|envoy.*xds\|xds.*server" ./gateway/gateway-controller/go.mod | head -10
echo ""
echo "=== Connection draining: drain timeout, drain_connections, close_connections_on_timeout ==="
grep -rn "DrainTimeout\|drain_timeout\|drain_connections\|close_conn\|DRAIN" ./gateway/gateway-controller/ --include="*.go" | head -20
grep -rn "DrainTimeout\|drain_timeout" ./gateway/ --include="*.yaml" --include="*.json" | head -10Length of output: 6357 🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== pkg/xds/snapshot.go full content ==="
cat ./gateway/gateway-controller/pkg/xds/snapshot.go
echo ""
echo "=== pkg/xds/translator.go - how it calls snapshot manager ==="
grep -n "snapshotManager\|SnapshotManager\|BuildSnapshot\|UpdateSnapshot\|GenerateSnapshot\|Translate\b" ./gateway/gateway-controller/pkg/xds/translator.go | head -30
echo ""
echo "=== main.go - how startup calls xds snapshot generation ==="
grep -n "GenerateSnapshot\|UpdateSnapshot\|BuildSnapshot\|snapshotManager\|policyManager\|loadRuntime\|hydrateConfig\|LoadArtifact\|xds\." ./gateway/gateway-controller/cmd/controller/main.go | head -60
echo ""
echo "=== how startup triggers initial xDS snapshot (look for the initial snapshot push) ==="
grep -n -A10 "loadRuntimeConfigs\|GenerateSnapshot\|UpdateSnapshot\|InitialSnapshot\|snapshotManager" ./gateway/gateway-controller/cmd/controller/main.go | head -100
echo ""
echo "=== pkg/xds/snapshot.go - NewSnapshot call with all resource types ==="
grep -n "NewSnapshot\|cache\.Snapshot\|resources\[" ./gateway/gateway-controller/pkg/xds/snapshot.go | head -30
echo ""
echo "=== drain timeout in envoy bootstrap config ==="
find ./gateway -name "*.yaml" -o -name "*.json" | xargs grep -ln "drain_time\|drain_timeout\|overload\|connection_pool" 2>/dev/null | head -5 | xargs -I{} sh -c 'echo "=== {} ==="; grep -n "drain_time\|drain_timeout\|per_connection_buffer\|connection_pool" {}'
echo ""
echo "=== envoy bootstrap or static resources ==="
find ./gateway -name "envoy-bootstrap*" -o -name "envoy.yaml" -o -name "*bootstrap*.yaml" 2>/dev/null | head -5 | xargs -I{} sh -c 'echo "=== {} ==="; cat {}'
echo ""
echo "=== check DB schema for cluster name fields ==="
find . -name "*.go" -path "*/storage*" | xargs grep -n "cluster_name\|ClusterName\|EnvoyCluster" 2>/dev/null | head -20
echo ""
echo "=== Verify startup generates one consolidated snapshot (not per-API) ==="
grep -n "for.*range\|for.*cfg\|for.*api" ./gateway/gateway-controller/pkg/xds/snapshot.go | head -20Length of output: 10951
1. Rollout atomicity and startup snapshot consolidationSingle atomic snapshot — confirmed. All resource types share one version string. Startup — one consolidated snapshot, not N per-API. On boot:
There is no per-API snapshot push on startup. 2. CDS-vs-RDS ordering windowThe window exists but is inherent to xDS SotW — not introduced by this PR.
The theoretical window: Envoy receives and ACKs the new CDS (warms In practice, Envoy's cluster warm-and-ready sequence delays the CDS ACK until all new clusters are warmed, which gives RDS time to arrive. But this is not a hard guarantee under all timing conditions. This risk is one-time on upgrade — it applies only to the initial cluster rename on the first config regeneration after deploying this PR. Post-upgrade, URL edits no longer rename the cluster, so this window never recurs for URL changes. 3. In-flight connection handling during the renameNo explicit 4. Persistence — no cluster names stored anywhereConfirmed: cluster name is purely derived in memory, never persisted.
5. Operational side effects
Neither effect causes a crash or routing failure. 6. RevertabilitySafe. The URL config (scheme, host, port, path) is stored in the DB as part of the API spec. Reverting the PR leaves the DB untouched; the old code regenerates VerdictDeploying this PR does not crash or permanently disrupt existing APIs. The only production risk is the one-time CDS-before-RDS window on the first config regeneration after upgrade, where a brief 503 is possible per API. This is inherent to xDS SotW and applies to any cluster rename operation in this stack; it is not new to this PR. Operator actions required:
|
Summary
Changes how the gateway names the Envoy clusters for an API's main and sandbox upstreams. Cluster names are now derived from the API's identity,
main_<hash>/sandbox_<hash>, where the hash is the first 12 bytes of sha256(apiID) in hex and the env is a plaintextmain_/sandbox_prefix (not part of the hash), instead of from the backend URL (cluster_<scheme>_<sanitized-host>). Both xDS builders (the Envoy xDS translator and the RuntimeDeployConfig transform path) derive the name through one shared helper, so the two paths cannot diverge.Why
With URL-derived names, editing a backend URL renames the cluster: the old cluster is destroyed, a new one is created, and routes are repointed, draining in-flight connections on every URL change. With identity-derived names the cluster name never changes for the lifetime of the API, so a URL edit propagates as an endpoint update on the same cluster and live traffic is not dropped. The name carries no URL information, so host, port, scheme, and path edits are all equally safe.
Design decisions
Identity in, URL out. The hash input is the apiID only; the env (main or sandbox) is a plaintext prefix on the name, not part of the hash. Anything URL-derived in the name would defeat the purpose, since some class of URL edit would still rename the cluster.
apiID hashed, env as a prefix. main and sandbox of the same API share one hash fragment and are told apart by the
main_/sandbox_prefix, so an operator can correlate an API's two clusters at a glance. The prefix guarantees main and sandbox never collide without relying on the hash.96-bit identity hash. The fragment is the first 12 bytes (96 bits) of sha256(apiID). Cross-API uniqueness rests on this truncation, and 96 bits keeps the birthday-collision probability negligible even at very large API counts, so two distinct APIs cannot silently share one Envoy cluster.
One naming helper for both builders. The gateway has two builders that must agree on cluster names (the Envoy xDS translator and the policy-side RuntimeDeployConfig transform). Both now call
pkg/utils/clusterkey, replacing the two hand-rolled sanitize functions that had to be kept in sync manually.ClusterKey and EnvoyClusterName unified. In the RDC path the internal cluster key and the Envoy-facing cluster name are now the same string. If they differ, the policy engine's default upstream cluster metadata points at a name Envoy does not know, which surfaces as 503 NoRoute.
Latent collision fixed. Under URL-derived naming, a main and sandbox upstream pointing at the same URL collided into one cluster, and two different APIs sharing a backend URL shared one cluster. The env prefix now keeps main and sandbox always distinct, and hashing the apiID gives each API its own fragment, so each API owns its clusters.
Migration notes (one-time effects on existing deployments)
cluster_<scheme>_<host>names in theupstream_clusterlabel need a one-time update.What changed
pkg/utils/clusterkey(new):APILevel(apiID)/APILevelName(env, apiID)naming helpers plus unit tests.pkg/xds/translator.go:resolveUpstreamClustertakes the API id and derives the hash name;sanitizeClusterNameremoved.pkg/transform/restapi.go:addUpstreamClusterderives the same hash name;EnvoyClusterNameset to the cluster key;sanitizeEnvoyClusterNameremoved.cmd/controller/main.go: comment documenting that both xDS paths must keep cluster-name derivation in sync (plus an import-order fix in the touched file).gateway/spec/impls/1-basic-gateway-with-controller/data-model.md: cluster-naming section updated from the old URL-sanitized scheme to the identity-derived scheme.Test coverage
clusterkeytests with known-answer vectors that pin the algorithm to sha256(apiID)[:12]; translator tests assert URL-independence and that the name carries no URL-derived information (TestResolveUpstreamCluster_DedupSameAPIDifferentURLs,TestResolveUpstreamCluster_MainSandboxNeverCollide,TestResolveUpstreamCluster_NameNotURLDerived); transformer tests assert the name shape on routes and clusters, that a route's default cluster always references a registered cluster (TestRestAPITransformer_APILevelDefaultClusterMatchesRealCluster), that a URL edit does not rename the cluster (TestRestAPITransformer_APILevelURLStableAcrossURLEdit), and that a main-only API registers no sandbox cluster or route (TestRestAPITransformer_APILevelMainOnlyHasNoSandboxCluster).api-level-url-stable.featurecovers a main upstream URL update that changes both host and path, a sandbox upstream URL update, default cluster resolution under cluster_header routing, main and sandbox on the same backend host getting separate identity-named clusters, and two APIs sharing a backend host each routing under their own cluster (with a post-delete check that removing one does not disturb the other). The main and sandbox scenarios assert cluster identity through the Envoy admin endpoint (identity-named clusters present and the oldcluster_<scheme>_<host>names absent, before and after the update), so they fail under the previous naming scheme.