Add ClusterIP service with SessionAffinity for MCP server backend#3992
Add ClusterIP service with SessionAffinity for MCP server backend#3992
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3992 +/- ##
==========================================
+ Coverage 68.61% 68.74% +0.12%
==========================================
Files 439 439
Lines 44867 44897 +30
==========================================
+ Hits 30786 30863 +77
+ Misses 11686 11627 -59
- Partials 2395 2407 +12 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
test/e2e/chainsaw/operator/single-tenancy/test-scenarios/sse/assert-mcpserver-svc.yaml
Show resolved
Hide resolved
jhrozek
left a comment
There was a problem hiding this comment.
See inline comments for specific observations and suggestions.
Pre-existing observation: RemoveWorkload in pkg/container/kubernetes/client.go only deletes the StatefulSet, not the associated services. This was already an issue for the headless service before this PR. The operator path is covered by finalizeMCPServer. Worth tracking separately if the CLI-to-K8s path is a supported workflow.
The proxy-runner connects to MCP server pods through a headless service which bypasses kube-proxy and provides no load balancing or session affinity. If the StatefulSet scales beyond 1 replica, sessions break because requests may hit different pods. Add a new regular ClusterIP service (mcp-<name>) with SessionAffinity: ClientIP alongside the existing headless service. The proxy-runner now uses this new service as its target host, getting proper load balancing with session stickiness. The headless service is retained for DNS discovery. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Extract shared applyService helper to reduce duplication between createHeadlessService and createMCPService - Add explicit SessionAffinityConfig.ClientIP.TimeoutSeconds (1800s) instead of relying on Kubernetes 3h default - Rename SSEHeadlessServiceName to MCPServiceName across types, setup, and client packages - Rename transportTypeRequiresHeadlessService to transportTypeRequiresBackendServices - Extract deleteIfExists helper in operator finalizer, removing nolint:gocyclo - Fix headless+NodePort conflict: skip NodePort type promotion for headless services (Kubernetes rejects clusterIP=None + type=NodePort) - Use slog.Debug instead of slog.Info for service creation success paths - Add spec.type and sessionAffinityConfig assertions to e2e tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
8cf2377 to
7a0f7be
Compare
Set the StatefulSet as owner of both the headless and MCP ClusterIP services so Kubernetes garbage-collects them when the StatefulSet is deleted. This closes the resource leak gap flagged in review. Add table-driven unit test covering both SSE and streamable-http transports, verifying service creation, session affinity config, and owner references. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Assert UID, APIVersion, Controller, and BlockOwnerDeletion fields on owner references, not just Kind and Name. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
||
| // DeployWorkload implements runtime.Runtime. | ||
| // | ||
| //nolint:gocyclo |
There was a problem hiding this comment.
i would have prefered to refactor and avoid the nolint...
| func (r *MCPServerReconciler) deleteIfExists(ctx context.Context, obj client.Object, name, namespace, kind string) error { | ||
| err := r.Get(ctx, types.NamespacedName{Name: name, Namespace: namespace}, obj) | ||
| if err == nil { | ||
| if delErr := r.Delete(ctx, obj); delErr != nil && !errors.IsNotFound(delErr) { |
There was a problem hiding this comment.
we have lost the logging here, is that intentional?
yrobla
left a comment
There was a problem hiding this comment.
approving with a pair of comments
Summary
mcp-<name>) withSessionAffinity: ClientIPfor MCP server StatefulSets, alongside the existing headless service (mcp-<name>-headless)Context
The headless service bypasses kube-proxy, so DNS returns raw pod IPs with no load balancing or session affinity. If the MCP server StatefulSet scales beyond 1 replica, sessions break because requests may hit different pods. The new ClusterIP service solves this with
SessionAffinity: ClientIP.Changes
pkg/container/kubernetes/client.gocreateMCPService()function, called fromDeployWorkload()aftercreateHeadlessService(). MovesSSEHeadlessServiceNameassignment to the new service. ExtractsserviceFieldManagerconstant.cmd/thv-operator/controllers/mcpserver_controller.gomcp-<name>service infinalizeMCPServer()test/e2e/chainsaw/operator/**/assert-mcpserver-svc.yamlsessionAffinity: ClientIPtest/e2e/chainsaw/operator/**/chainsaw-test.yamlTest plan
task lintpasses (0 issues)task testpasses (unit tests)kubectl get svc mcp-<name>-headless(headless, unchanged)kubectl get svc mcp-<name>(new service withsessionAffinity: ClientIP)🤖 Generated with Claude Code