Add ClusterIP service with SessionAffinity for MCP server backend by JAORMX · Pull Request #3992 · stacklok/toolhive

JAORMX · 2026-03-04T08:28:48Z

Summary

Adds a new regular ClusterIP service (mcp-<name>) with SessionAffinity: ClientIP for MCP server StatefulSets, alongside the existing headless service (mcp-<name>-headless)
The proxy-runner now targets this new service instead of the headless one, providing proper load balancing with session stickiness when the StatefulSet scales beyond 1 replica
Adds finalizer cleanup for the new service and Chainsaw e2e test assertions for all SSE and streamable-http scenarios

Context

The headless service bypasses kube-proxy, so DNS returns raw pod IPs with no load balancing or session affinity. If the MCP server StatefulSet scales beyond 1 replica, sessions break because requests may hit different pods. The new ClusterIP service solves this with SessionAffinity: ClientIP.

Changes

File	Change
`pkg/container/kubernetes/client.go`	New `createMCPService()` function, called from `DeployWorkload()` after `createHeadlessService()`. Moves `SSEHeadlessServiceName` assignment to the new service. Extracts `serviceFieldManager` constant.
`cmd/thv-operator/controllers/mcpserver_controller.go`	Adds cleanup of `mcp-<name>` service in `finalizeMCPServer()`
`test/e2e/chainsaw/operator/**/assert-mcpserver-svc.yaml`	4 new assertion files verifying the service exists with `sessionAffinity: ClientIP`
`test/e2e/chainsaw/operator/**/chainsaw-test.yaml`	4 updated test files adding the new assertion step

Test plan

task lint passes (0 issues)
task test passes (unit tests)
Deploy operator, create MCPServer with SSE transport, verify both services exist:
- kubectl get svc mcp-<name>-headless (headless, unchanged)
- kubectl get svc mcp-<name> (new service with sessionAffinity: ClientIP)
Chainsaw e2e tests pass for single-tenancy and multi-tenancy SSE/streamable-http scenarios

🤖 Generated with Claude Code

codecov · 2026-03-04T08:38:33Z

Codecov Report

❌ Patch coverage is 65.06024% with 29 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.74%. Comparing base (c5fdadc) to head (049784a).

Files with missing lines	Patch %	Lines
...d/thv-operator/controllers/mcpserver_controller.go	0.00%	16 Missing ⚠️
pkg/container/kubernetes/client.go	83.07%	8 Missing and 3 partials ⚠️
pkg/runtime/setup.go	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3992      +/-   ##
==========================================
+ Coverage   68.61%   68.74%   +0.12%     
==========================================
  Files         439      439              
  Lines       44867    44897      +30     
==========================================
+ Hits        30786    30863      +77     
+ Misses      11686    11627      -59     
- Partials     2395     2407      +12

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

pkg/container/kubernetes/client.go

cmd/thv-operator/controllers/mcpserver_controller.go

test/e2e/chainsaw/operator/single-tenancy/test-scenarios/sse/assert-mcpserver-svc.yaml

jhrozek

See inline comments for specific observations and suggestions.

Pre-existing observation: RemoveWorkload in pkg/container/kubernetes/client.go only deletes the StatefulSet, not the associated services. This was already an issue for the headless service before this PR. The operator path is covered by finalizeMCPServer. Worth tracking separately if the CLI-to-K8s path is a supported workflow.

pkg/container/kubernetes/client.go

The proxy-runner connects to MCP server pods through a headless service which bypasses kube-proxy and provides no load balancing or session affinity. If the StatefulSet scales beyond 1 replica, sessions break because requests may hit different pods. Add a new regular ClusterIP service (mcp-<name>) with SessionAffinity: ClientIP alongside the existing headless service. The proxy-runner now uses this new service as its target host, getting proper load balancing with session stickiness. The headless service is retained for DNS discovery. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Extract shared applyService helper to reduce duplication between createHeadlessService and createMCPService - Add explicit SessionAffinityConfig.ClientIP.TimeoutSeconds (1800s) instead of relying on Kubernetes 3h default - Rename SSEHeadlessServiceName to MCPServiceName across types, setup, and client packages - Rename transportTypeRequiresHeadlessService to transportTypeRequiresBackendServices - Extract deleteIfExists helper in operator finalizer, removing nolint:gocyclo - Fix headless+NodePort conflict: skip NodePort type promotion for headless services (Kubernetes rejects clusterIP=None + type=NodePort) - Use slog.Debug instead of slog.Info for service creation success paths - Add spec.type and sessionAffinityConfig assertions to e2e tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Set the StatefulSet as owner of both the headless and MCP ClusterIP services so Kubernetes garbage-collects them when the StatefulSet is deleted. This closes the resource leak gap flagged in review. Add table-driven unit test covering both SSE and streamable-http transports, verifying service creation, session affinity config, and owner references. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Assert UID, APIVersion, Controller, and BlockOwnerDeletion fields on owner references, not just Kind and Name. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

yrobla · 2026-03-05T12:18:13Z

pkg/container/kubernetes/client.go


 // DeployWorkload implements runtime.Runtime.
+//
+//nolint:gocyclo


i would have prefered to refactor and avoid the nolint...

yrobla · 2026-03-05T12:19:09Z

cmd/thv-operator/controllers/mcpserver_controller.go

+func (r *MCPServerReconciler) deleteIfExists(ctx context.Context, obj client.Object, name, namespace, kind string) error {
+	err := r.Get(ctx, types.NamespacedName{Name: name, Namespace: namespace}, obj)
+	if err == nil {
+		if delErr := r.Delete(ctx, obj); delErr != nil && !errors.IsNotFound(delErr) {


we have lost the logging here, is that intentional?

yrobla

approving with a pair of comments

JAORMX requested review from ChrisJBurns, amirejaz, blkt, jerm-dro, jhrozek and yrobla as code owners March 4, 2026 08:28

github-actions bot added the size/S Small PR: 100-299 lines changed label Mar 4, 2026

ChrisJBurns reviewed Mar 4, 2026

View reviewed changes

jhrozek reviewed Mar 4, 2026

View reviewed changes

JAORMX and others added 2 commits March 5, 2026 08:17

JAORMX force-pushed the feat/mcp-backend-session-affinity branch from 8cf2377 to 7a0f7be Compare March 5, 2026 06:40

JAORMX requested review from eleftherias and lujunsan as code owners March 5, 2026 06:40

github-actions bot added size/S Small PR: 100-299 lines changed and removed size/S Small PR: 100-299 lines changed labels Mar 5, 2026

github-actions bot added size/M Medium PR: 300-599 lines changed and removed size/S Small PR: 100-299 lines changed labels Mar 5, 2026

Strengthen owner reference test assertions

049784a

Assert UID, APIVersion, Controller, and BlockOwnerDeletion fields on owner references, not just Kind and Name. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions bot added size/M Medium PR: 300-599 lines changed and removed size/M Medium PR: 300-599 lines changed labels Mar 5, 2026

yrobla reviewed Mar 5, 2026

View reviewed changes

yrobla approved these changes Mar 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ClusterIP service with SessionAffinity for MCP server backend#3992

Add ClusterIP service with SessionAffinity for MCP server backend#3992
JAORMX wants to merge 4 commits intomainfrom
feat/mcp-backend-session-affinity

JAORMX commented Mar 4, 2026

Uh oh!

codecov bot commented Mar 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jhrozek left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yrobla Mar 5, 2026

Uh oh!

yrobla Mar 5, 2026

Uh oh!

yrobla left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

JAORMX commented Mar 4, 2026

Summary

Context

Changes

Test plan

Uh oh!

codecov bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jhrozek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yrobla Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

yrobla Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

yrobla left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Mar 4, 2026 •

edited

Loading